SlideShare una empresa de Scribd logo
1 de 137
Descargar para leer sin conexión
Providing	
  Linked	
  Data	
  
Presented	
  by:	
  
Barry	
  Norton	
  
Maribel	
  Acosta	
  
Motivation:	
  Music!	
  
2	
  
Visualiza3on	
  
Module	
  
Metadata	
  
Streaming	
  providers	
  
Physical	
  Wrapper	
  
Downloads	
  
Data	
  acquisi3on	
  
R2R	
  Transf.	
  LD	
  Wrapper	
  
Musical	
  Content	
  
Applica3on	
  
Analysis	
  &	
  
Mining	
  Module	
  
LD	
  Data	
  set	
  Access	
  
LD	
  Wrapper	
  
RDF/	
  
XML	
  
Integrated	
  
Dataset	
  
Interlinking	
   Cleansing	
  
Vocabulary	
  
Mapping	
  
SPARQL	
  
Endpoint	
  
Publishing	
  
RDFa	
  
Other	
  content	
  
LINKED	
  DATA	
  LIFECYCLE	
  
EUCLID	
  -­‐	
  Querying	
  Linked	
  Data	
   3	
  
Linked	
  Data	
  Principles	
  
1.  Use	
  URIs	
  as	
  names	
  for	
  things.	
  
2.  Use	
  HTTP	
  URIs	
  so	
  that	
  users	
  can	
  look	
  up	
  
those	
  names.	
  
3.  When	
  someone	
  looks	
  up	
  a	
  URI,	
  provide	
  
useful	
  informa9on,	
  using	
  the	
  standards	
  
(RDF*,	
  SPARQL).	
  
4.  Include	
  links	
  to	
  other	
  URIs,	
  so	
  that	
  users	
  
can	
  discover	
  more	
  things.	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   4	
  
CH	
  1	
  
Linked	
  Data	
  
Lifecycle	
  
Linked	
  Data	
  Lifecycle	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   5	
  
Source:	
  Sören	
  Auer.	
  “The	
  Seman3c	
  Data	
  Web”	
  (slides)	
  
Source:	
  José	
  M.	
  Alvarez.	
  “My	
  Linked	
  Data	
  Lifecycle”	
  
Source:	
  Michael	
  Hausenblas.	
  “Linked	
  Data	
  lifeyclcle”	
  
Core	
  Tasks	
  for	
  Providing	
  	
  	
  	
  	
  	
  
Linked	
  Data	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   6	
  
Based	
  on	
  the	
  proposed	
  LD	
  lifecycles	
  and	
  the	
  LD	
  
principles,	
  we	
  can	
  iden3fy	
  3	
  main	
  tasks	
  for	
  providing	
  LD:	
  
① Crea9ng:	
  includes	
  data	
  extrac3on,	
  crea3on	
  of	
  HTTP	
  
URIs,	
  and	
  vocabulary	
  selec3on.	
  (LD	
  principles	
  1	
  &	
  2)	
  
② Interlinking:	
  involves	
  the	
  crea3on	
  of	
  (RDF)	
  links	
  to	
  
external	
  data	
  sets.	
  (LD	
  principle	
  4)	
  
③ Publishing:	
  consists	
  of	
  crea3ng	
  the	
  metadata	
  and	
  
making	
  the	
  data	
  set	
  accessible.	
  (LD	
  principle	
  3)	
  
	
  
Agenda	
  
1.  Crea9ng	
  Linked	
  Data	
  
2.  Interlinking	
  Linked	
  Data	
  
3.  Publishing	
  Linked	
  Data	
  
4.  Linked	
  Data	
  publishing	
  checklist	
  
7	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
CREATING	
  LINKED	
  DATA	
  
EUCLID	
  -­‐	
  Querying	
  Linked	
  Data	
   8	
  
•  The	
  data	
  of	
  interest	
  may	
  be	
  stored	
  in	
  a	
  wide	
  range	
  or	
  
formats:	
  
	
  
•  Several	
  tools	
  support	
  the	
  process	
  of	
  mining	
  data	
  
from	
  different	
  repositories,	
  for	
  example:	
  
Extracting	
  the	
  Data	
  
9	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Spreadsheets	
  
or	
  tabular	
  data	
  	
  
Databases	
   Text	
  
R2RML	
  
Using	
  the	
  RDF	
  Data	
  Model	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   10	
  
•  The	
  RDF	
  data	
  model	
  is	
  used	
  to	
  represent	
  the	
  
extracted	
  informa3on	
  
•  The	
  nodes	
  represent	
  the	
  concepts/en33es	
  within	
  
the	
  data.	
  A	
  node	
  corresponds	
  to	
  a	
  URI,	
  a	
  blank	
  node	
  
or	
  a	
  literal	
  (only	
  in	
  predicates)	
  
•  The	
  rela3onships	
  between	
  the	
  concepts/en33es	
  are	
  
modeled	
  as	
  arcs	
  
Subject	
   Object	
  
Predicate	
  
Naming	
  Things:	
  URIs	
  
•  All	
  the	
  things	
  or	
  dis3nct	
  en33es	
  within	
  the	
  data	
  must	
  
be	
  named	
  
•  According	
  to	
  the	
  Linked	
  Data	
  principles,	
  the	
  standard	
  
mechanism	
  to	
  name	
  en33es	
  is	
  the	
  URI	
  
•  Designing	
  Cool	
  URIs:	
  
–  Leave	
  out	
  informa3on	
  about	
  the	
  data	
  regarding	
  to:	
  author,	
  
technologies,	
  status,	
  access	
  mechanisms,	
  …	
  
–  Simplicity:	
  short,	
  mnemonic	
  URIs	
  
–  Stability:	
  maintain	
  the	
  URIs	
  as	
  long	
  as	
  possible	
  
–  Manageability:	
  issue	
  the	
  URIs	
  in	
  a	
  way	
  that	
  you	
  can	
  manage	
  
11	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Source:hjp://www.w3.org/TR/cooluris/	
  
Selecting	
  Vocabularies	
  
•  Vocabularies	
  	
  model	
  the	
  concepts	
  and	
  the	
  
rela9onship	
  between	
  them	
  in	
  a	
  knowledge	
  domain	
  
•  Terms	
  from	
  well-­‐known	
  vocabularies	
  should	
  be	
  
reused	
  wherever	
  possible	
  
•  New	
  terms	
  should	
  be	
  define	
  only	
  if	
  you	
  can	
  not	
  find	
  
required	
  terms	
  in	
  exis3ng	
  vocabularies	
  
•  A	
  large	
  number	
  of	
  vocabularies	
  in	
  RDF	
  are	
  openly	
  
available,	
  e.g.,	
  Linked	
  Open	
  Vocabularies	
  (LOV)	
  
12	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Selecting	
  Vocabularies	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   13	
  
Linked	
  Open	
  Vocabularies	
  
322	
  vocabularies	
  
classified	
  by	
  domain	
  
Source:hjp://lov.okfn.org/dataset/lov/	
  
Selecting	
  Vocabularies	
  (3)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   14	
  
Linked	
  Open	
  Vocabularies:	
  Analyzing	
  MusicOntology	
  
Source:hjp://lov.okfn.org/dataset/lov/details/vocabulary_mo.html	
  
Selecting	
  Vocabularies	
  (4)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   15	
  
Other	
  lists	
  of	
  well-­‐known	
  vocabularies	
  are	
  maintained	
  
by:	
  
•  W3C	
  SWEO	
  Linking	
  Open	
  Data	
  community	
  project	
  
hjp://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/
CommonVocabularies	
  
•  Library	
  Linked	
  Data	
  Incubator	
  Group:	
  Vocabularies	
  in	
  
the	
  library	
  domain	
  
hjp://www.w3.org/2005/Incubator/lld/XGR-­‐lld-­‐vocabdataset-­‐20111025	
  
INTERLINKING	
  LINKED	
  DATA	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   16	
  
Interlinking	
  Data	
  Sets	
  
•  It’s	
  one	
  of	
  the	
  Linked	
  Data	
  principles!	
  
•  Involves	
  the	
  crea3on	
  of	
  RDF	
  links	
  between	
  two	
  
different	
  RDF	
  data	
  sets:	
  
–  Links	
  at	
  instance	
  level	
  (rdfs:seeAlso,	
  owl:sameAs)	
  
–  Links	
  at	
  schema	
  level	
  (RDFS	
  subclass/subproperty,	
  OWL	
  
equivalent	
  class/property,	
  SKOS	
  mapping	
  proper9es)	
  
•  Appropriate	
  links	
  are	
  detected	
  via	
  link	
  discovery	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   17	
  
4.	
  Include	
  links	
  to	
  other	
  URIs,	
  so	
  that	
  users	
  can	
  discover	
  
more	
  things.	
  
Interlinking	
  Data	
  Sets	
  (2)	
  
Challenges	
  for	
  link	
  discovery	
  
•  Linked	
  Data	
  sets	
  are	
  heterogeneous	
  in	
  terms	
  of	
  
vocabularies,	
  formats	
  and	
  data	
  representa3on	
  
•  Large	
  range	
  of	
  knowledge	
  domains	
  	
  
•  Scalability:	
  LD	
  is	
  composed	
  of	
  a	
  large	
  number	
  of	
  data	
  
sets	
  and	
  RDF	
  triples,	
  hence	
  it	
  is	
  not	
  possible	
  to	
  
compare	
  every	
  possible	
  en3ty	
  pair	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   18	
  
Source:	
  Robert	
  Isele.	
  “LOD2	
  Webinar	
  Series:Silk”	
  	
  
Interlinking	
  Data	
  Sets	
  (3)	
  
Challenges	
  for	
  link	
  discovery	
  
•  It	
  corresponds	
  to	
  the	
  en9ty	
  resolu9on	
  problem:	
  
deciding	
  whether	
  two	
  en..es	
  correspond	
  to	
  same	
  object	
  in	
  
the	
  real	
  world	
  
•  Name	
  ambigui9es:	
  typos,	
  misspellings,	
  different	
  
languages,	
  homonyms	
  	
  
•  Structural	
  ambigui9es:	
  same	
  concepts/en33es	
  with	
  
different	
  structures.	
  Requires	
  the	
  applica3on	
  of	
  ontology	
  
and	
  schema	
  matching	
  techniques	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   19	
  
Interlinking	
  Data	
  Sets	
  (4)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   20	
  
RDF	
  data	
  sets	
  	
  
can	
  be	
  interlinked:	
  
Manually	
  
•  Involves	
  the	
  manual	
  explora3on	
  of	
  
LD	
  data	
  sets	
  and	
  their	
  RDF	
  
resources	
  to	
  iden3fy	
  linking	
  targets	
  
•  May	
  not	
  be	
  feasible	
  when	
  the	
  
number	
  of	
  en33es	
  within	
  the	
  data	
  
set	
  is	
  very	
  large	
  
	
  
Automatically	
  
•  Using	
  tools	
  that	
  perform	
  link	
  
discovery	
  based	
  on	
  linkage	
  rules,	
  for	
  
example:	
  Silk,	
  Limes	
  and	
  xCurator	
  
owl:sameAs	
  &	
  rdfs:seeAlso	
  
•  owl:sameAs	
  
•  Creates	
  links	
  between	
  individuals	
  	
  
•  States	
  that	
  two	
  URIs	
  refer	
  to	
  the	
  same	
  individuals	
  
	
  
•  rdfs:seeAlso	
  
•  States	
  that	
  a	
  resource	
  may	
  provide	
  addi3onal	
  informa3on	
  
about	
  the	
  subject	
  resource	
  
•  Links	
  in	
  MusicBrainz:	
  
–  owl:seeAlso	
  is	
  used	
  for	
  music	
  ar3sts	
  
–  rdfs:seeAlso	
  is	
  used	
  for	
  albums	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   21	
  
SKOS	
  
•  Simple	
  Knowledge	
  Organiza3on	
  System	
  
–  hjp://www.w3.org/TR/skos-­‐reference/	
  	
  
•  Data	
  model	
  for	
  knowledge	
  organiza3on	
  systems	
  
(thesauri,	
  classifica3on	
  scheme,	
  taxonomies)	
  	
  
•  SKOS	
  data	
  is	
  expressed	
  as	
  RDF	
  triples	
  
•  Allows	
  the	
  crea3on	
  of	
  RDF	
  links	
  between	
  different	
  
data	
  sets	
  with	
  the	
  usage	
  of	
  mapping	
  proper9es	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   22	
  
SKOS:	
  Mapping	
  Properties	
  
These	
  proper3es	
  are	
  used	
  to	
  link	
  SKOS	
  concepts	
  
(par3cularly	
  instances)	
  in	
  different	
  schemes:	
  
•  skos:closeMatch:	
  links	
  two	
  concepts	
  that	
  are	
  
sufficiently	
  similar	
  (some3mes	
  can	
  be	
  used	
  interchangeably)	
  
•  skos:exactMatch:	
  indicates	
  that	
  the	
  two	
  concepts	
  
can	
  be	
  used	
  interchangeably.	
  	
  
•  Axiom:	
  It	
  is	
  a	
  transi9ve	
  property	
  
•  skos:relatedMatch:	
  states	
  an	
  associa3ve	
  mapping	
  
link	
  between	
  two	
  concepts	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   23	
  
Example	
  of	
  SKOS	
  exact	
  match	
  
	
  
	
  
SKOS:	
  Mapping	
  Properties	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   24	
  
mo:MusicArtist	
  skos:exactMatch	
  dbpedia-­‐ont:MusicalArtist.	
  
@prefix	
  skos:	
  <http://www.w3.org/2004/02/skos/core#>	
  
@prefix	
  mo:	
  <http://purl.org/ontology/mo/>	
  
@prefix	
  dbpedia-­‐ont:	
  <http://dbpedia.org/ontology/>	
  
@prefix	
  schema:	
  <http://schema.org/>	
  
	
  
	
  
	
  
mo:MusicGroup	
  skos:exactMatch	
  schema:MusicGroup.	
  
mo:MusicGroup	
  skos:exactMatch	
  dbpedia-­‐ont:Band.	
  
Example	
  of	
  SKOS	
  close	
  match	
  
	
  
	
  
SKOS:	
  Mapping	
  Properties	
  (3)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   25	
  
mo:SignalGroup	
  skos:closeMatch	
  schema:MusicAlbum.	
  
@prefix	
  skos:	
  <http://www.w3.org/2004/02/skos/core#>	
  
@prefix	
  mo:	
  <http://purl.org/ontology/mo/>	
  
@prefix	
  dbpedia-­‐ont:	
  <http://dbpedia.org/ontology/>	
  
@prefix	
  schema:	
  <http://schema.org/>	
  
	
  
	
  
mo:SignalGroup	
  skos:closeMatch	
  dbpedia-­‐ont:Album.	
  
Integrity	
  conditions	
  
•  Guarantee	
  consistency	
  and	
  avoid	
  contradic3ons	
  in	
  
the	
  rela3onships	
  between	
  SKOS	
  concepts	
  
SKOS:	
  Mapping	
  Properties	
  (4)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   26	
  
skos:Mapping	
  
Relation	
  
skos:close	
  
Match	
  
skos:exact	
  
Match	
  
skos:related	
  
Match	
  
Symmetric	
  
&	
  Transi9ve	
  
Disjoint	
  
with	
  
Par3al	
  Mapping	
  Rela3on	
  diagram	
  with	
  integrity	
  condi3ons	
  
Symmetric	
  
PUBLISHING	
  LINKED	
  DATA	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   27	
  
Publishing	
  Linked	
  Data	
  
Once	
  the	
  RDF	
  data	
  set	
  has	
  been	
  created	
  and	
  
interlinked,	
  the	
  publishing	
  process	
  involves	
  the	
  
following	
  tasks:	
  
1.  Metadata	
  crea3on	
  for	
  describing	
  the	
  data	
  set	
  	
  
2.  Making	
  the	
  data	
  set	
  accessible	
  
3.  Exposing	
  the	
  data	
  set	
  in	
  Linked	
  Data	
  repositories	
  
4.  Valida9ng	
  the	
  data	
  set	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   28	
  
•  Consists	
  of	
  providing	
  (machine-­‐readable)	
  metadata	
  
of	
  RDF	
  data	
  sets	
  which	
  can	
  be	
  processed	
  by	
  engines	
  
•  This	
  informa3on	
  allows	
  for:	
  
–  Efficient	
  and	
  effec3ve	
  search	
  of	
  data	
  sets	
  
–  Selec3on	
  of	
  appropriate	
  data	
  sets	
  (for	
  consump3on	
  or	
  
interlinking)	
  
–  Get	
  general	
  sta3s3cs	
  of	
  the	
  data	
  sets	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   29	
  
Describing	
  RDF	
  Data	
  Sets	
  
Describing	
  RDF	
  Data	
  Sets	
  (2)	
  
•  The	
  common	
  language	
  for	
  describing	
  RDF	
  data	
  sets	
  is	
  
VoID	
  (Vocabulary	
  of	
  Interlinked	
  Data	
  sets)	
  	
  
•  Defines	
  an	
  RDF	
  data	
  set	
  with	
  the	
  predicate	
  
void:Dataset	
  
	
  
•  Covers	
  4	
  types	
  of	
  metadata:	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   30	
  
•  General	
  metadata	
  
•  Structural	
  metadata	
  
•  Descrip3ons	
  of	
  linksets	
  
•  Access	
  metadata	
  
VoID:	
  General	
  Metadata	
  
•  General	
  metadata	
  is	
  used	
  by	
  users	
  to	
  iden3fy	
  
appropriate	
  data	
  sets.	
  
•  Specifies	
  informa3on	
  about	
  descrip3on	
  of	
  the	
  data	
  
set,	
  contact	
  person/organiza3on,	
  the	
  license	
  of	
  the	
  
data	
  set,	
  data	
  subject	
  and	
  some	
  technical	
  features.	
  
•  VoID	
  (re)uses	
  predicates	
  from	
  the	
  Dublin	
  Core	
  
Metadata1	
  and	
  FOAF2	
  vocabularies.	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   31	
  
1	
  hjp://dublincore.org/documents/2010/10/11/dcmi-­‐terms/	
  
2	
  hjp://xmlns.com/foaf/spec/	
  
VoID:	
  General	
  Metadata	
  (2)	
  
Predicate	
   Range	
   Descrip9on	
  
dcterms:title	
   Literal	
   Name	
  of	
  the	
  data	
  set.	
  
dcterms:description	
   Literal	
   Descrip3on	
  of	
  the	
  data	
  set.	
  
dcterms:source	
   RDF	
  resource	
   Source	
  from	
  which	
  the	
  data	
  set	
  was	
  derived.	
  
dcterms:creator	
   RDF	
  resource	
   Primarily	
  responsible	
  of	
  crea3ng	
  the	
  data	
  set.	
  
dcterms:date	
   xsd:date	
   Time	
  associated	
  with	
  an	
  event	
  in	
  the	
  life-­‐cycle	
  of	
  the	
  resource.	
  
dcterms:created	
   xsd:date	
   Date	
  of	
  crea3on	
  of	
  the	
  data	
  set.	
  
dcterms:issued	
   xsd:date	
   Date	
  of	
  publica3on	
  of	
  the	
  data	
  set.	
  
dcterms:modified	
   xsd:date	
   Date	
  on	
  which	
  the	
  data	
  set	
  was	
  changed.	
  
foaf:homepage	
   Literal	
   Name	
  of	
  the	
  data	
  set.	
  
dcterms:publisher	
   RDF	
  resource	
   En3ty	
  responsible	
  for	
  making	
  the	
  data	
  set	
  available.	
  
dcterms:contributor	
   RDF	
  resource	
   En3ty	
  responsible	
  for	
  making	
  contribu3ons	
  to	
  the	
  data	
  set.	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   32	
  
Source:	
  	
  hjp://www.w3.org/TR/void/#metadata	
  	
  General	
  Information	
  
Contains	
  informa3on	
  about	
  the	
  crea3on	
  of	
  the	
  data	
  set	
  	
  
VoID:	
  General	
  Metadata	
  (3)	
  
Other	
  Information	
  
•  License	
  of	
  the	
  data	
  set:	
  specifies	
  the	
  usage	
  condi3ons	
  of	
  
the	
  data.	
  The	
  license	
  can	
  be	
  pointed	
  with	
  the	
  property	
  
dcterms:license	
  
•  Category	
  of	
  the	
  data	
  set:	
  to	
  specify	
  the	
  topics	
  or	
  domains	
  
covered	
  by	
  the	
  data	
  set,	
  the	
  property	
  dcterms:subject	
  
can	
  be	
  used	
  
•  Technical	
  features:	
  the	
  property	
  void:feature	
  can	
  be	
  
used	
  to	
  express	
  technical	
  proper3es	
  of	
  the	
  data	
  (e.g.	
  RDF	
  
serializa3on	
  formats)	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   33	
  
VoID:	
  Structural	
  Metadata	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   34	
  
•  Provides	
  high-­‐level	
  informa3on	
  about	
  the	
  internal	
  
structure	
  of	
  the	
  data	
  set	
  
•  This	
  metadata	
  is	
  useful	
  when	
  exploring	
  or	
  querying	
  
the	
  data	
  set	
  
•  Includes	
  informa3on	
  about	
  resources,	
  vocabularies	
  
used	
  in	
  the	
  data	
  set,	
  sta3s3cs	
  and	
  examples	
  of	
  
resources	
  in	
  the	
  data	
  set	
  
VoID:	
  Structural	
  Metadata	
  (2)	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   35	
  
Information	
  about	
  resources	
  
•  Example	
  resources:	
  allow	
  users	
  to	
  get	
  an	
  impression	
  of	
  the	
  
kind	
  of	
  resources	
  included	
  in	
  the	
  data	
  set.	
  Examples	
  can	
  be	
  
shown	
  with	
  the	
  property	
  void:exampleResource	
  
•  Pajern	
  for	
  resource	
  URIs:	
  the	
  void:uriSpace	
  property	
  
can	
  be	
  used	
  to	
  state	
  that	
  all	
  the	
  en3ty	
  URIs	
  in	
  a	
  data	
  set	
  start	
  
with	
  a	
  given	
  string	
  	
  
:MusicBrainz	
  a	
  void:Dataset;	
  
	
  	
  	
  void:exampleResource	
  
	
  	
  	
  	
  	
  <http://musicbrainz.org/artist/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d>	
  .	
  
:MusicBrainz	
  a	
  void:Dataset;	
  
	
  	
  	
  void:uriSpace	
  "http://musicbrainz.org/"	
  .	
  
VoID:	
  Structural	
  Metadata	
  (3)	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   36	
  
Vocabularies	
  used	
  in	
  the	
  data	
  set	
  
•  The	
  void:vocabulary	
  property	
  iden3fies	
  the	
  vocabulary	
  or	
  
ontology	
  that	
  is	
  used	
  in	
  a	
  data	
  set	
  
•  Typically,	
  only	
  the	
  most	
  relevant	
  vocabularies	
  are	
  listed	
  
•  This	
  property	
  can	
  only	
  be	
  used	
  for	
  en3re	
  vocabularies.	
  It	
  
cannot	
  be	
  used	
  to	
  express	
  that	
  a	
  subset	
  of	
  the	
  vocabulary	
  
occurs	
  in	
  the	
  data	
  set.	
  	
  
:MusicBrainz	
  a	
  void:Dataset;	
  
	
  	
  	
  void:vocabulary	
  <http://purl.org/ontology/mo/>	
  .	
  
VoID:	
  Structural	
  Metadata	
  (4)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   37	
  
Source:	
  	
  hjp://www.w3.org/TR/void/#metadata	
  	
  
Statistics	
  about	
  a	
  data	
  set	
  
Express	
  numeric	
  sta3s3cs	
  about	
  a	
  data	
  set:	
  
	
  
	
  
	
  
Predicate	
   Range	
   Descrip9on	
  
void:triples	
   Number	
   Total	
  number	
  of	
  triples	
  contained	
  in	
  the	
  data	
  set.	
  	
  
void:entities	
   Number	
  
Total	
  number	
  of	
  en33es	
  that	
  are	
  described	
  in	
  the	
  data	
  set.	
  
An	
  en3ty	
  must	
  have	
  a	
  URI,	
  and	
  match	
  the	
  void:uriRegexPajern	
  	
  
void:classes	
   Number	
   Total	
  number	
  of	
  dis3nct	
  classes	
  in	
  the	
  data	
  set.	
  
void:properties	
   Number	
   Total	
  number	
  of	
  dis3nct	
  proper3es	
  in	
  the	
  data	
  set.	
  
void:distinctSubjects	
   Number	
   Total	
  number	
  of	
  dis3nct	
  subjects	
  in	
  the	
  data	
  set.	
  
void:distinctObjects	
   Number	
   Total	
  number	
  of	
  dis3nct	
  objects	
  in	
  the	
  data	
  set.	
  
void:documents	
   Number	
   Total	
  number	
  of	
  documents,	
  in	
  case	
  that	
  the	
  data	
  set	
  is	
  
published	
  as	
  a	
  set	
  of	
  individual	
  documents.	
  
VoID:	
  Structural	
  Metadata	
  (5)	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   38	
  
Partitioned	
  data	
  sets	
  
•  The	
  void:subset	
  property	
  provides	
  descrip3on	
  of	
  parts	
  of	
  a	
  
data	
  set	
  
	
  
•  Data	
  sets	
  can	
  be	
  par33oned	
  based	
  on	
  classes	
  or	
  proper9es:	
  
•  void:classPartition	
  contains	
  only	
  instances	
  of	
  a	
  par3cular	
  class	
  
•  void:propertyPartition	
  contains	
  only	
  triples	
  with	
  a	
  par3cular	
  predicate	
  
:MusicBrainz	
  a	
  void:Dataset;	
  
	
  	
  void:subset	
  :MusicBrainzArtists	
  .	
  
:MusicBrainz	
  a	
  void:Dataset;	
  
	
  	
  void:classPartition	
  [	
  void:class	
  mo:Release	
  .]	
  ;	
  
	
  	
  void:propertyParition	
  [	
  void:property	
  mo:member	
  .]	
  .	
  
VoID:	
  Describing	
  Linksets	
  	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   39	
  
•  Linkset:	
  collec3on	
  of	
  RDF	
  links	
  between	
  two	
  
RDF	
  data	
  sets	
  
:DS1	
   :DS2	
  
:LS1	
   :LS2	
  
Image	
  based	
  on	
  hjp://seman3cweb.org/wiki/File:Void-­‐linkset-­‐conceptual.png	
  
owl:sameAs	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
@PREFIX	
  void:<http://rdfs.org/ns/void#>	
  	
  
@PREFIX	
  owl:<http://www.w3.org/2002/07/owl#>	
  
	
  
:DS1	
  a	
  void:Dataset	
  .	
  
:DS2	
  a	
  void:Dataset	
  .	
  
:DS1	
  void:subset	
  :LS1	
  .	
  
:LS1	
  a	
  void:Linkset;	
  
	
  	
  	
  	
  	
  void:linkPredicate	
  	
  
	
  	
  	
  	
  	
  	
  	
  owl:sameAs;	
  	
  
	
  	
  	
  	
  	
  void:target	
  :DS1,	
  :DS2	
  .	
  
VoID:	
  Describing	
  Linksets	
  (2)	
  	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   40	
  
Example	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
@PREFIX	
  void:<http://rdfs.org/ns/void#>	
  	
  
@PREFIX	
  skos:<http://www.w3.org/2002/07/owl#>	
  
	
  
:MusicBrainz	
  a	
  void:Dataset	
  .	
  
:DBpedia	
  a	
  void:Dataset	
  .	
  
	
  
:MusicBrainz	
  void:classPartition	
  :MBArtists	
  .	
  
:MBArtists	
  void:class	
  mo:MusicArtist	
  .	
  
	
  
:MBArtists	
  a	
  void:Linkset;	
  
	
  	
  	
  	
  	
  void:linkPredicate	
  	
  
	
  	
  	
  	
  	
  	
  	
  skos:exactMatch;	
  	
  
	
  	
  	
  	
  	
  void:target	
  :MusicBrainz,	
  :DBpedia	
  .	
  
The	
  access	
  metadata	
  describes	
  the	
  methods	
  of	
  
accessing	
  the	
  actual	
  RDF	
  data	
  set	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
	
  
*	
  This	
  assumes	
  that	
  the	
  default	
  graph	
  of	
  the	
  SPARQL	
  endpoint	
  contains	
  the	
  data	
  set.	
  
VoID	
  cannot	
  express	
  that	
  a	
  data	
  set	
  is	
  contained	
  a	
  specific	
  named	
  graph.	
  This	
  can	
  be	
  
specified	
  with	
  SPARQL	
  1.1.	
  Service	
  Descrip3on	
  	
  	
  
VoID:	
  Access	
  Metadata	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   41	
  
Method	
   Predicate	
  
	
  
Descrip9on	
  
URI	
  look	
  up	
  endpoint	
   void:uriLookupEndpoint	
  
Specifies	
  the	
  URI	
  of	
  a	
  service	
  for	
  accessing	
  the	
  data	
  
set	
  (different	
  from	
  the	
  SPARQL	
  protocol)	
  
Root	
  resource	
   void:rootResource	
  
URI	
  of	
  the	
  top	
  concepts	
  (only	
  for	
  data	
  sets	
  
structured	
  as	
  trees)	
  
SPARQL	
  endpoint	
   void:sparqlEndpoint	
   Provides	
  access	
  to	
  the	
  data	
  set	
  via	
  the	
  SPARQL	
  
protocol.*	
  	
  
RDF	
  data	
  dumps	
   void:dataDump	
   Specifies	
  the	
  loca3on	
  of	
  the	
  dump	
  file.	
  If	
  the	
  data	
  
set	
  is	
  split	
  into	
  mul3ple	
  files,	
  then	
  several	
  values	
  of	
  
this	
  property	
  are	
  provided.	
  	
  
CH	
  5	
  
Providing	
  Access	
  to	
  the	
  	
  	
  	
  
Data	
  Set	
  
	
  
The	
  data	
  set	
  can	
  be	
  accessed	
  via	
  different	
  
mechanisms:	
  	
  	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   42	
  
RDFa	
  
RDF	
  
dump	
  
SPARQL	
  
endpoint	
  
Dereferencing	
  
HTTP	
  URIs	
  
Dereferencing	
  HTTP	
  URIs	
  
•  Allows	
  for	
  easily	
  exploring	
  certain	
  resources	
  
contained	
  in	
  the	
  data	
  set	
  
•  	
  What	
  to	
  return	
  for	
  a	
  URI?	
  
•  Immediate	
  descrip9on:	
  triples	
  where	
  the	
  URI	
  is	
  the	
  subject.	
  
•  Backlinks:	
  triples	
  where	
  the	
  URI	
  is	
  the	
  object.	
  
•  Related	
  descrip9ons:	
  informa3on	
  of	
  interest	
  in	
  typical	
  usage	
  scenarios.	
  
•  Metadata:	
  informa3on	
  as	
  author	
  and	
  licensing	
  informa3on.	
  
•  Syntax:	
  RDF	
  descrip3ons	
  as	
  RDF/XML	
  and	
  human-­‐readable	
  formats.	
  
•  Applica3ons	
  (e.g.	
  LD	
  browsers)	
  render	
  the	
  retrieved	
  
informa3on	
  so	
  it	
  can	
  be	
  perceived	
  by	
  a	
  user.	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   43	
  
Source:	
  	
  How	
  to	
  Publish	
  Linked	
  Data	
  on	
  The	
  Web	
  -­‐	
  Chris	
  Bizer,	
  Richard	
  Cyganiak,	
  Tom	
  Heath.	
  
	
  
CH	
  1	
  
Dereferencing	
  HTTP	
  URIs	
  (2)	
  
Example:	
  Dereferencing	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   44	
  
RDFa	
  
•  RDFa	
  =	
  “RDF	
  in	
  ajributes”	
  
•  Extension	
  to	
  HTML5	
  for	
  embedding	
  RDF	
  within	
  
HTML	
  pages:	
  
–  The	
  HTML	
  is	
  processed	
  by	
  the	
  browser,	
  the	
  (human)	
  
consumer	
  don’t	
  see	
  the	
  RDF	
  data	
  	
  
–  The	
  RDF	
  triples	
  within	
  the	
  page	
  are	
  consumed	
  by	
  APIs	
  to	
  
extract	
  the	
  (semi-­‐)structured	
  data	
  
	
  
•  It	
  is	
  considered	
  as	
  the	
  bridge	
  between	
  the	
  Web	
  of	
  
Data	
  and	
  the	
  Web	
  of	
  Documents	
  
•  It	
  is	
  a	
  complete	
  serializa9on	
  of	
  RDF	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
45	
  
RDFa:	
  Attributes	
  
A]ribute	
  role	
   A]ribute	
   Descrip9on	
  
Syntax	
  
prefix	
   List	
  of	
  prefix-­‐name	
  IRIs	
  pairs	
  
vocab	
   IRI	
  that	
  specifies	
  the	
  vocabulary	
  where	
  the	
  concept	
  is	
  defined	
  
Subject	
   about	
   Specifies	
  the	
  subject	
  of	
  the	
  rela3onship	
  
Predicate	
  
property	
   Express	
  the	
  rela3onship	
  between	
  the	
  subject	
  and	
  the	
  value	
  
rel	
   Defines	
  a	
  rela3on	
  between	
  the	
  subject	
  and	
  a	
  URL	
  	
  
rev	
   Express	
  reverse	
  rela3onships	
  between	
  two	
  resources	
  
Resource	
  
href	
   Specifies	
  an	
  object	
  URI	
  for	
  the	
  rel	
  and	
  rev	
  ajributes	
  
resource	
   Same	
  as	
  href	
  (used	
  when	
  href	
  is	
  not	
  present)	
  
src	
   Specifies	
  the	
  subject	
  of	
  a	
  rela3onship	
  
Literal	
  
datatype	
   Express	
  the	
  datatype	
  of	
  the	
  object	
  of	
  the	
  property	
  ajribute	
  
content	
   Supply	
  machine-­‐readable	
  content	
  for	
  a	
  literal	
  
xml:lang,	
  lang	
   Specifies	
  the	
  language	
  of	
  the	
  literal	
  
Macro	
   typeof	
   Indicate	
  the	
  RDF	
  type(s)	
  to	
  associate	
  with	
  a	
  subject	
  
inlist	
   An	
  object	
  is	
  added	
  to	
  the	
  list	
  of	
  a	
  predicate.	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
46	
  
RDFa:	
  Example	
  	
  
Extracting	
  RDF	
  from	
  HTML	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
47	
  
<div	
  class="ar3stheader"	
  	
  
	
  	
  about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"	
  	
  	
  
	
  	
  typeof="hjp://purl.org./ontology/mo/MusicGroup">	
  
	
  	
  	
  …	
  
</div>	
  
<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>	
  
	
  
	
  
	
  
	
  
HTML	
  (+RDFa):	
  
RDF:	
  
RDFa:	
  Example	
  	
  
Extracting	
  RDF	
  from	
  HTML	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
48	
  
<div	
  class="ar3stheader"	
  	
  
	
  	
  about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"	
  	
  	
  
	
  	
  typeof="hjp://purl.org./ontology/mo/MusicGroup">	
  
	
  	
  	
  …	
  
</div>	
  
<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>	
  
	
  	
  <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>	
  
	
  
	
  
	
  
HTML	
  (+RDFa):	
  
RDF:	
  
RDFa:	
  Example	
  	
  
Extracting	
  RDF	
  from	
  HTML	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
49	
  
<div	
  class="ar3stheader"	
  	
  
	
  	
  about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"	
  	
  	
  
	
  	
  typeof="hjp://purl.org./ontology/mo/MusicGroup">	
  
	
  	
  	
  …	
  
</div>	
  
<hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>	
  
	
  	
  <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>	
  
	
  	
  	
  	
  <hjp://purl.org./ontology/mo/MusicGroup>.	
  
	
  
	
  
HTML	
  (+RDFa):	
  
RDF:	
  
RDFa:	
  Example	
  (2)	
  
Extracting	
  RDF	
  from	
  MusicBrainz.org	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
50	
  
hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d	
  
RDFa:	
  Example	
  (2)	
  
Extracting	
  RDF	
  from	
  MusicBrainz.org	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
51	
  
Source:	
  hjp://www.w3.org/2007/08/pyRdfa/	
  
RDFa:	
  Example	
  (2)	
  
Extracting	
  RDF	
  from	
  MusicBrainz.org	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
	
  
52	
  
hjp://www.w3.org/2007/08/pyRdfa/extract?uri=hjp%3A%2F%2Fmusicbrainz.org
%2Far3st%2Fb10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d&format=nt	
  
Watch	
  the	
  EUCLID	
  screencast:	
  http://vimeo.com/euclidproject	
  
RDF	
  Dump	
  
•  An	
  RDF	
  dump	
  refers	
  to	
  a	
  file	
  which	
  contains	
  (part	
  of)	
  
a	
  data	
  set	
  specified	
  in	
  an	
  RDF	
  format	
  (RDF/XML,	
  N-­‐
Triples,	
  N-­‐Quads)	
  
•  The	
  data	
  set	
  can	
  be	
  split	
  into	
  several	
  RDF	
  dumps	
  
	
  
•  A	
  list	
  of	
  available	
  data	
  sets	
  available	
  as	
  RDF	
  dumps	
  
can	
  be	
  found	
  at:	
  
–  hjp://www.w3.org/wiki/DataSetRDFDumps	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   53	
  
SPARQL	
  Endpoint	
  
•  The	
  SPARQL	
  endpoint	
  refers	
  to	
  the	
  URI	
  of	
  the	
  
listener	
  of	
  the	
  SPARQL	
  protocol	
  service,	
  which	
  
handles	
  requests	
  for	
  SPARQL	
  protocol	
  opera3ons	
  
	
  
•  The	
  user	
  submits	
  SPARQL	
  queries	
  to	
  the	
  SPARQL	
  
endpoint	
  in	
  order	
  to	
  retrieve	
  only	
  a	
  desired	
  subset	
  of	
  
the	
  RDF	
  data	
  set	
  
	
  
•  List	
  of	
  available	
  SPARQL	
  endpoints:	
  
•  hjp://www.w3.org/wiki/SparqlEndpoints	
  
•  hjp://labs.mondeca.com/sparqlEndpointsStatus/	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   54	
  
CH	
  2	
  
Using	
  Linked	
  Data	
  Catalogs	
  
•  Data	
  catalogs,	
  markets	
  or	
  repositories	
  are	
  pla{orms	
  
dedicated	
  to	
  provide	
  access	
  to	
  a	
  wide	
  range	
  of	
  data	
  
sets	
  from	
  different	
  domains	
  
	
  
•  Allow	
  data	
  consumers	
  to	
  easily	
  find	
  and	
  use	
  the	
  data	
  
•  Usually	
  the	
  catalogs	
  offer	
  relevant	
  metadata	
  about	
  
the	
  crea3on	
  of	
  the	
  data	
  set	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   55	
  
Using	
  Linked	
  Data	
  Catalogs	
  (2)	
  
How	
  to	
  publish	
  an	
  RDF	
  data	
  set	
  into	
  a	
  catalog?	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   56	
  
Create	
  your	
  own	
  data	
  
catalog	
  
Recommended	
  for	
  big	
  
organiza3ons/ins3tu3ons	
  
aiming	
  at	
  providing	
  a	
  large	
  
number	
  of	
  data	
  sets	
  
Use	
  a	
  data	
  management	
  
system,	
  for	
  example:	
  
Upload	
  your	
  data	
  set	
  
into	
  an	
  exis3ng	
  catalog	
  
Allows	
  data	
  consumers	
  to	
  
easily	
  find	
  new	
  data	
  sets	
  
Common	
  LD	
  catalogs	
  are:	
  
	
  -­‐	
  
	
  -­‐	
  The	
  Linking	
  Open	
  Data	
  Cloud	
  
Validating	
  Data	
  Sets	
  
There	
  are	
  different	
  ways	
  to	
  validate	
  the	
  published	
  RDF	
  
data	
  set:	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   57	
  
General	
  
validators	
  
Parsing	
  &	
  
Syntax	
  
•  Vapour	
  -­‐	
  Performs	
  two	
  types	
  of	
  tests:	
  without	
  content	
  
nego3a3on	
  and	
  reques3ng	
  RDF/XML	
  content	
  
	
  	
  	
  	
  	
  	
  	
  	
  hjp://validator.linkeddata.org/vapour	
  
•  URI	
  Debugger	
  -­‐	
  Retreieves	
  the	
  HTTP	
  responses	
  of	
  accessing	
  a	
  URI	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  hjp://linkeddata.informa3k.hu-­‐berlin.de/uridbg/	
  
	
  
•  RDF	
  Triple-­‐Checker	
  –	
  Dereferences	
  namespaces	
  associated	
  with	
  
the	
  resources	
  used	
  in	
  the	
  document	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  hjp://graphite.ecs.soton.ac.uk/checker/	
  
•  W3C	
  RDF/XML	
  Valida9on	
  Service	
  –	
  Evaluates	
  the	
  syntax	
  of	
  RDF/
XML	
  documents	
  and	
  displays	
  the	
  RDF	
  triples	
  in	
  it	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
hjp://validator.linkeddata.org/vapour	
  	
  	
  
•  W3C	
  Markup	
  Valida9on	
  Service	
  –	
  Checks	
  syntac3c	
  correctness	
  
for	
  web	
  documents	
  with	
  RDFa	
  markup	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
hjp://validator.w3.org/	
  
•  RDF:ALERTS	
  –	
  Validates	
  syntax,	
  undefined	
  resources,	
  datatype	
  
and	
  other	
  types	
  of	
  errors	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
hjp://swse.deri.org/RDFAlerts/	
  
Accessibility	
  
Validating	
  Data	
  Sets	
  (2)	
  
Example:	
  Validating	
  URIs	
  with	
  Vapour	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   58	
  
Source:	
  hjp://idi.fundacionc3c.org/vapour	
  	
  
Validating	
  Data	
  Sets	
  (3)	
  
Example:	
  Validating	
  URIs	
  with	
  Vapour	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   59	
  
Source:	
  hjp://idi.fundacionc3c.org/vapour	
  	
  
Validating	
  Data	
  Sets	
  (4)	
  
Example:	
  Validating	
  URIs	
  with	
  
Vapour	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   60	
  
Source:	
  hjp://idi.fundacionc3c.org/vapour	
  	
  
Example:	
  Validating	
  URIs	
  with	
  Vapour	
  
hjp://dbpedia.org/page/The_Beatles	
  
hjp://dbpedia.org/data/The_Beatles.xml	
  
HTML	
  content	
  
RDF	
  document	
  
PROVIDING	
  LINKED	
  DATA:	
  
CHECKLIST	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   61	
  
Providing	
  Linked	
  Data:	
  
Checklist	
  (1)	
  
Creating	
  Linked	
  Data	
  
o All	
  the	
  relevant	
  en33es/concepts	
  were	
  
effec3vely	
  extracted	
  from	
  the	
  raw	
  data	
  ?	
  
o Are	
  all	
  the	
  created	
  URIs	
  dereferenceable?	
  
o Are	
  you	
  reusing	
  terms	
  from	
  widely	
  accepted	
  	
  
vocabularies?	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   62	
  
Providing	
  Linked	
  Data:	
  
Checklist	
  (2)	
  
Interlinking	
  Linked	
  Data	
  
o Is	
  the	
  data	
  set	
  linked	
  to	
  other	
  RDF	
  data	
  sets?	
  
o Are	
  the	
  created	
  vocabulary	
  terms	
  linked	
  to	
  
other	
  vocabularies?	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   63	
  
Providing	
  Linked	
  Data:	
  
Checklist	
  (3)	
  
Publishing	
  Linked	
  Data	
  
o Do	
  you	
  provide	
  data	
  set	
  metadata?	
  
o Do	
  you	
  provide	
  informa3on	
  about	
  licensing?	
  
o Do	
  you	
  provide	
  addi3onal	
  access	
  methods?	
  
o Is	
  the	
  data	
  set	
  available	
  in	
  LD	
  catalogs?	
  
o Did	
  the	
  data	
  set	
  pass	
  the	
  valida3on	
  tests?	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   64	
  
Summary	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   65	
  
•  The	
  Linked	
  Data	
  lifecycle:	
  
•  3	
  core	
  tasks:	
  crea3ng,	
  interlinking	
  and	
  publishing	
  
•  Crea3on	
  of	
  Linked	
  Data:	
  
•  Extrac3ng	
  relevant	
  data,	
  using	
  URIs	
  to	
  name	
  en33es	
  and	
  selec3ng	
  
vocabularies	
  and	
  expressing	
  the	
  data	
  using	
  the	
  RDF	
  data	
  model	
  
•  Interlinking	
  Linked	
  Data:	
  
•  Challenges	
  of	
  link	
  discovery,	
  using	
  Silk	
  to	
  create	
  links	
  between	
  two	
  
data	
  sets	
  and	
  using	
  SKOS	
  links	
  	
  
•  Publishing	
  Linked	
  Data:	
  
•  Crea3on	
  of	
  data	
  set	
  metadata;	
  publishing	
  the	
  data	
  set	
  via	
  RDF	
  
dumps,	
  SPARQL	
  endpoints	
  or	
  RDFa;	
  using	
  RDFa	
  and	
  schema.org	
  to	
  
enrich	
  search	
  results,	
  and	
  uploading	
  the	
  data	
  set	
  to	
  a	
  LD	
  catalog	
  
In	
  this	
  chapter	
  we	
  studied:	
  
The	
  Web	
  &	
  Linked	
  Data	
  
•  Linked	
  Data	
  catalogs	
  
•  Applica9ons	
  
CKAN	
  
•  CKAN	
  is	
  an	
  open	
  source	
  pla{orm	
  for	
  developing	
  data	
  
set	
  catalogs	
  
•  Implement	
  useful	
  tools	
  for	
  data	
  publishers	
  to	
  
support:	
  
•  Data	
  harves3ng	
  
•  Crea3on	
  of	
  metadata	
  
•  Access	
  mechanisms	
  to	
  the	
  data	
  set	
  
•  Upda3ng	
  the	
  data	
  set	
  
•  Monitoring	
  the	
  access	
  to	
  the	
  data	
  set	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   67	
  
CKAN	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   68	
  
Source:	
  hjp://ckan.org	
  
CKAN	
  (3)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   69	
  
Source:	
  hjp://ckan.org	
  
•  The	
  Data	
  Hub	
  is	
  a	
  community-­‐run	
  data	
  catalog	
  which	
  
contains	
  more	
  than	
  5,000	
  data	
  sets1	
  
•  “(…)	
  is	
  an	
  openly	
  editable	
  open	
  data	
  catalogue,	
  in	
  the	
  
style	
  of	
  Wikipedia”.2	
  
	
  
•  It	
  is	
  implemented	
  on	
  top	
  of	
  the	
  CKAN	
  pla{orm	
  	
  
•  Allows	
  the	
  crea3on	
  of	
  groups:	
  
–  The	
  Linking	
  Open	
  Data	
  Cloud	
  group	
  exclusively	
  contains	
  
Linked	
  Data	
  sets	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   70	
  
1	
  According	
  to	
  the	
  informa3on	
  presented	
  in	
  the	
  portal	
  on	
  March	
  2013	
  
2	
  Source:	
  hjp://datahub.io/about	
  	
  
The	
  Data	
  Hub	
  	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   71	
  
Source:	
  hjp://datahub.io/	
  
The	
  Data	
  Hub	
  (2)	
  
The	
  Data	
  Hub	
  (3)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   72	
  
Source:	
  hjp://datahub.io/	
  
The	
  Linking	
  Open	
  Data	
  Cloud	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   73	
  
September	
  2011	
  
Source:	
  Linking	
  Open	
  Data	
  cloud	
  diagram,	
  by	
  Richard	
  Cyganiak	
  and	
  Anja	
  Jentzsch	
  
The	
  Linking	
  Open	
  Data	
  Cloud	
  
How	
  to	
  publish	
  an	
  RDF	
  data	
  set	
  in	
  this	
  cloud?	
  
1.  The	
  data	
  set	
  must	
  follow	
  the	
  Linked	
  Data	
  principles	
  
2.  The	
  data	
  set	
  must	
  contain	
  at	
  least	
  1,000	
  RDF	
  triples	
  
3.  The	
  data	
  set	
  must	
  contain	
  at	
  least	
  50	
  RDF	
  links	
  to	
  a	
  
data	
  set	
  that	
  is	
  already	
  in	
  the	
  diagram	
  
4.  Access	
  to	
  the	
  data	
  set	
  must	
  be	
  provided	
  
Once	
  these	
  criteria	
  are	
  met,	
  the	
  data	
  publisher	
  must	
  add	
  
the	
  data	
  set	
  to	
  the	
  Data	
  Hub	
  catalog,	
  and	
  contact	
  the	
  
administrators	
  of	
  the	
  Linking	
  Open	
  Data	
  Cloud	
  group	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   74	
  
Source:	
  hjp://lod-­‐cloud.net/	
  
Linked	
  Data	
  &	
  Search	
  Engines	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   75	
  
•  Search	
  engines	
  collect	
  informa3on	
  about	
  web	
  
resources	
  in	
  order	
  to	
  produce	
  richer	
  search	
  results	
  
by	
  improving	
  the	
  display	
  of	
  the	
  results	
  
•  This	
  is	
  only	
  possible	
  if	
  the	
  search	
  engines	
  are	
  able	
  to	
  
understand	
  the	
  content	
  within	
  the	
  web	
  pages	
  	
  
•  The	
  HTML	
  pages	
  must	
  be	
  annotated	
  with	
  machine-­‐
readable	
  content	
  to	
  describe	
  their	
  content:	
  
Mark	
  up	
  format	
   Vocabulary	
  
RDFa	
  for	
  marking	
  up	
  data	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   76	
  
•  RDFa	
  is	
  used	
  to	
  provide	
  (semi-­‐)structured	
  Linked	
  
Data	
  embedded	
  in	
  web	
  content	
  
•  Examples:	
  
– Some	
  search	
  engines	
  use	
  RDFa,	
  e.g.,	
  Google,	
  
Yahoo!	
  and	
  Bing	
  
– Facebook’s	
  Open	
  Graph	
  is	
  based	
  on	
  RDFa	
  
Google	
  Rich	
  Snippets	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   77	
  
•  Embedding	
  seman3cs	
  via	
  RDFa	
  (or	
  microformats/
microdata)	
  enhances	
  search	
  results:	
  
Google	
  Rich	
  Snippets	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   78	
  
Schema.org	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   79	
  
•  Collec3on	
  of	
  schemas/vocabularies	
  to	
  markup	
  the	
  
HTML	
  pages	
  
•  It	
  is	
  recognized	
  by	
  Bing,	
  Google,	
  Yahoo!	
  and	
  Yandex	
  
•  Covers	
  a	
  wide	
  range	
  of	
  knowledge	
  domains	
  	
  
•  It	
  also	
  offers	
  an	
  extension	
  mechanism	
  in	
  case	
  the	
  
publisher	
  is	
  interested	
  in	
  adding	
  new	
  concepts	
  to	
  the	
  
vocabularies	
  
Schema.org	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   80	
  
The	
  vocabularies	
  cover	
  the	
  following	
  topics:	
  
Source:	
  hjp://schema.org/docs/schemas.html	
  
“The	
  world	
  is	
  too	
  rich,	
  
complex	
  and	
  interes.ng	
  for	
  
a	
  single	
  schema	
  to	
  describe	
  
fully	
  on	
  its	
  own.	
  With	
  
schema.org	
  we	
  aim	
  to	
  find	
  a	
  
balance,	
  by	
  providing	
  a	
  core	
  
schema	
  that	
  covers	
  lots	
  of	
  
situa.ons,	
  alongside	
  
extension	
  mechanisms	
  for	
  
extra	
  detail.”	
  
(Dan	
  Brickley,	
  schema.org)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   81	
  
Integrates(/aligns)	
  exis3ng	
  vocabularies	
  where	
  
appropriate,	
  e.g.	
  rNews	
  
Source:	
  hjp://schema.org/Ar3cle	
  
Schema.org	
  (3)	
  
Google	
  Knowledge	
  Graph	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   82	
  
•  The	
  user	
  is	
  able	
  to	
  find	
  
answer	
  to	
  their	
  queries	
  
without	
  browsing	
  pages	
  
•  Provides	
  detailed	
  
informa3on	
  
	
  
Google	
  Knowledge	
  Graph	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   83	
  
•  Google	
  Search	
  results	
  
	
  include	
  structured	
  data	
  
from	
  Freebase	
  
	
  
•  Might	
  disambiguate	
  
search	
  terms	
  
 	
  	
  	
  	
  	
  	
  Freebase	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   84	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  
•  Knowledge	
  base	
  of	
  
structured	
  data	
  
•  Data	
  is	
  stored	
  as	
  a	
  
graph	
  
	
  
•  Describes	
  data	
  from	
  
different	
  domains	
  
Bing	
  Snapshot	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   85	
  
•  Provides	
  structured	
  data	
  related	
  to	
  the	
  search	
  term	
  
•  Includes	
  a	
  significant	
  number	
  of	
  en33es	
  from	
  more	
  
domains	
  
•  Connects	
  data	
  from	
  LinkedIn	
  
•  Is	
  is	
  powered	
  by	
  the	
  graph	
  engine	
  Trinity.RDF	
  
Bing	
  Snapshot	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   86	
  
 	
  	
  	
  	
  	
  	
  Open	
  Graph	
  Protocol	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   87	
  
•  It	
  was	
  originally	
  created	
  by	
  Facebook	
  
•  Allows	
  describing	
  web	
  content	
  as	
  graph	
  objects,	
  
establishing	
  connec3ons	
  between	
  people	
  and	
  
objects	
  
•  The	
  descrip3ons	
  are	
  embedded	
  in	
  the	
  web	
  page	
  as	
  
RDFa	
  data	
  
•  Supports	
  descrip3on	
  of	
  several	
  domains:	
  basic	
  
metadata,	
  music,	
  video,	
  ar3cles,	
  books,	
  websites	
  and	
  
user	
  profiles	
  
Source:	
  hjp://ogp.me/	
  
 	
  	
  	
  	
  	
  	
  Open	
  Graph	
  Protocol	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   88	
  
Source:	
  hjp://ogp.me/	
  
Who	
  is	
  using	
  Open	
  Graph	
  protocol?	
  
Source:	
  hjp://ogp.me/	
  
Facebook	
  
Google	
  
Mixi	
  
Consumers	
   Publishers	
  
IMDb	
  
Microso•	
  
NHL	
  
Posterous	
  
Rojen	
  Tomatoes	
  
TIME	
  
 	
  	
  	
  	
  	
  	
  Open	
  Graph	
  Protocol	
  (3)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   89	
  
•  Facebook	
  expands	
  vocabulary	
  of	
  rela3onships	
  beyond	
  
“friendship”	
  and	
  “like”	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  more	
  ac9ons!	
  	
  
Source:	
  hjps://developers.facebook.com/docs/opengraph/	
  
 	
  	
  	
  	
  	
  	
  Open	
  Graph	
  Protocol	
  &	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  Facebook	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   90	
  
List	
  of	
  domains	
  and	
  ac9ons	
  
Source:	
  hjps://developers.facebook.com/docs/opengraph/	
  
•  Listen	
  
•  Create	
  a	
  playlist	
  
•  Watch	
  
•  Rate	
  
•  Wants	
  to	
  watch	
  
•  Rate	
  
•  Read	
  
•  Quote	
  
•  Wants	
  to	
  read	
  
•  Achieve	
  
•  High	
  score	
  
•  Bike	
  
•  Run	
  
•  Walk	
  
•  Like	
  
•  Recommend	
  
•  Follow	
  
General	
  
Music	
  
Movies	
  	
  
&	
  TV	
  
Games	
  
Fitness	
  
Book	
  
How	
  can	
  we	
  exploit	
  these	
  links	
  and	
  rela3onships?	
  
 	
  	
  	
  	
  	
  Facebook	
  Graph	
  Search	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   91	
  
•  focuse	
  on	
  people	
  and	
  their	
  interests,	
  exploi3ng	
  
how	
  everything	
  is	
  related	
  to	
  each	
  other	
  
•  Queries	
  are	
  specified	
  using	
  natural	
  language	
  
•  Takes	
  advantage	
  of	
  context	
  and	
  suggest	
  possible	
  
queries	
  	
  
•  Allows	
  for	
  building	
  more	
  complex	
  (expressive)	
  
queries	
  that	
  are	
  not	
  possible	
  with	
  normal	
  search:	
  
–  For	
  example,	
  “music	
  liked	
  by	
  me	
  and	
  friends	
  who	
  live	
  in	
  
my	
  city”	
  
 	
  	
  	
  	
  	
  Facebook	
  Graph	
  Search	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   92	
  
	
  
	
  
Context	
  (informa3on	
  from	
  profile):	
  
Graph	
  search	
  sugges9ons:	
  
 	
  	
  	
  	
  	
  Facebook	
  Graph	
  Search	
  (3)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   93	
  
Results	
  
 	
  	
  	
  	
  	
  Facebook	
  Graph	
  Search	
  (4)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   94	
  
Observations	
  
	
  
•  Allows	
  for	
  conjunc3ve	
  queries	
  (applying	
  filter	
  over	
  intermediate	
  
results	
  =	
  “apply	
  operator”)	
  
•  Disjunc9ve	
  queries	
  are	
  not	
  supported:	
  
–  For	
  example:	
  “My	
  friends	
  who	
  like	
  Seman3cWeb.com	
  OR	
  ReadWrite”	
  
	
  
•  Post	
  search	
  is	
  not	
  supported	
  
–  It	
  is	
  not	
  possible	
  to	
  search	
  in	
  post	
  content	
  submijed	
  to	
  the	
  3meline	
  
•  User	
  privacy	
  segngs	
  affect	
  the	
  results	
  
Tools	
  for	
  providing	
  Linked	
  Data	
  
•  Extrac9ng	
  data	
  from	
  spreadsheets:	
  OpenRefine	
  
•  Extrac9ng	
  data	
  from	
  RDBMS:	
  R2RML	
  
•  Extrac9ng	
  data	
  from	
  text:	
  Zemanta,	
  OpenCalais,	
  GATE	
  
•  Interlinking	
  data	
  sets:	
  Silk	
  
EXTRACTING	
  DATA	
  FROM	
  
SPREADSHEETS	
  WITH	
  OPENREFINE	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   96	
  
Integrate	
  Chart	
  Data	
  
•  Task:	
  Integrate	
  latest	
  chart	
  
informa3on	
  into	
  your	
  RDF	
  
database.	
  
•  Data	
  may	
  be	
  available	
  in	
  non-­‐
RDF	
  formats:	
  
–  Plain	
  text	
  
–  CSV,	
  TSV,	
  separator-­‐based	
  
files	
  
–  HTML	
  tables	
  
–  Spreadsheets	
  
(OpenDocument,	
  Excel,	
  …)	
  
–  XML	
  
–  JSON	
  
–  …	
  
97	
  
LD	
  Data	
  set	
  Access	
  
Integrated	
  
Data	
  Set	
  
Interlinking	
   Cleansing	
  
Vocabulary	
  
Mapping	
  
SPARQL	
  
Endpoint	
  
Publishing	
  
CSV/	
  
TSV	
  
HTML	
   Spreadsheets	
   JSON	
  
Data	
  acquisi3on	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Example	
  Data	
  
The Beatles, 250 million
Elvis Presley, 203.3 million
Michael Jackson, 157.4 million
Madonna, 160.1 million
Led Zeppelin, 135.5 million
Queen, 90.5 million	
  
	
  
98	
  
hjp://en.wikipedia.org/wiki/	
  
List_of_best-­‐selling_music_ar3sts	
  
Ar3st	
  
Country	
  of	
  
origin	
  
Period	
  
ac3ve	
  
Release-­‐year	
  
of	
  first	
  
charted	
  
record	
  
Total	
  cer3fied	
  units	
  
(from	
  available	
  markets)[Notes]	
  
The	
  Beatles	
  
United	
  
Kingdom	
  
1960–
1970[4]	
  
1962[4]	
  
Total	
  available	
  cer9fied	
  units:	
  	
  
250	
  million[show]	
  
Elvis	
  Presley	
  
United	
  
States	
  
1954–
1977[28]	
  
1954[28]	
  
Total	
  available	
  cer9fied	
  units:	
  
203.3	
  million[show]	
  
Michael	
  
Jackson[Note	
  2]	
  
United	
  
States	
  
1964–
2009[32]	
  
1971[32]	
  
Total	
  available	
  cer9fied	
  units:	
  
157.4	
  million[show]	
  
Madonna	
  
United	
  
States	
  
1979–
present[44]	
  
1982[44]	
  
Total	
  available	
  cer9fied	
  units:	
  
160.1	
  million[show]	
  
Led	
  Zeppelin	
  
United	
  
Kingdom	
  
1968–
1980[50]	
  
1969[50]	
  
Total	
  available	
  cer9fied	
  units:	
  
135.5	
  million[show]	
  
Queen	
  
United	
  
Kingdom	
  
1971–
present[53]	
  
1973[53]	
  
Total	
  available	
  cer9fied	
  units:	
  	
  
90.5	
  million[show]	
  
{
"artist": {
"class": "artist",
"name": "The Beatles"
},
"rank": 1,
"value": 250 million
},
…
CSV	
  
JSON	
  
HTML	
  tables	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  OpenRefine	
  
	
  
•  transforms	
  and	
  cleans	
  messy	
  
input	
  data	
  sets.	
  
	
  
•  is	
  an	
  open-­‐source	
  successor	
  of	
  
Google	
  Refine.	
  
	
  
•  allows	
  for	
  en3ty	
  reconcilia3on	
  
against	
  SPARQL	
  endpoints	
  or	
  
RDF	
  data.	
  
	
  
•  is	
  extended	
  with	
  plugins	
  that	
  
enhance	
  its	
  func3onality,	
  e.g.	
  
for	
  RDF	
  support.	
  
99	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Quick	
  Facts	
  
Use	
  of	
  OpenRefine	
  
100	
  
1.  Messy	
  input	
  data	
  is	
  
imported,	
  	
  transformed	
  
into	
  a	
  table	
  represen-­‐
ta3on	
  and	
  cleaned.	
  
3.  Define	
  the	
  structure	
  of	
  
the	
  RDF	
  output.	
  
	
  
4.  The	
  data	
  is	
  exported	
  
into	
  some	
  RDF	
  syntax.	
  
2.  En3ty	
  reconcilia3on	
  is	
  
applied	
  to	
  allow	
  for	
  
interlinking	
  with	
  
exis3ng	
  data	
  sets.	
  
The Beatles, 250 million
Elvis Presley, 203.3 million
Michael Jackson, 157.4 million
Madonna, 160.1 million
Led Zeppelin, 135.5 million
Queen, 90.5 million	
  
	
  
CSV	
  
musicbrainz:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d :totalSales "25000000000"^^xsd:int .

musicbrainz:01809552-4f87-45b0-afff-2c6f0730a3be :totalSales "2.033E10"^^xsd:int .

musicbrainz:f27ec8db-af05-4f36-916e-3d57f91ecf5e :totalSales "1.574E10"^^xsd:int .

musicbrainz:79239441-bfd5-4981-a70c-55c3f15c1287 :totalSales "1.601E10"^^xsd:int .

musicbrainz:678d88b2-87b0-403b-b63d-5da7465aecc3 :totalSales "1.355E10"^^xsd:int .

musicbrainz:0383dadf-2a4e-4d10-a46a-e9e041da8eb3 :totalSales "9.05E9"^^xsd:int .
RDF	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Typical	
  steps:	
  
•  Group	
  and	
  explore	
  data	
  
items	
  
•  Dele3ng	
  columns	
  or	
  rows	
  
based	
  on	
  filter	
  condi3on	
  
•  Split	
  columns	
  into	
  several	
  
columns	
  based	
  on	
  
condi3on	
  
•  Modify	
  messy	
  data	
  items	
  
with	
  GREL,	
  a	
  powerful	
  
expression	
  language	
  
•  Replay	
  steps	
  from	
  a	
  
previous	
  Refine	
  project	
  
101	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Data	
  Transformation	
  
How	
  to	
  Generate	
  RDF?	
  
•  Addi3onal	
  problem:	
  data	
  needs	
  to	
  be	
  interlinked	
  
with	
  exis3ng	
  MusicBrainz	
  data	
  
•  This	
  is	
  the	
  point	
  where	
  plugins	
  come	
  into	
  play:	
  
–  RDF	
  Refine:	
  developed	
  by	
  DERI	
  
–  An	
  extension	
  of	
  OpenRefine	
  to	
  support	
  RDF	
  
102	
  
?	
  
RDF	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Core	
  Capabilities	
  
•  Interlinking	
  of	
  data	
  by	
  en3ty	
  reconcilia3on	
  
– Against	
  SPARQL	
  endpoints,	
  RDF	
  dumps	
  
– Discovery	
  of	
  relevant	
  RDF	
  data	
  sets	
  
	
  
•  RDF	
  export	
  with	
  the	
  help	
  of	
  RDF	
  skeletons	
  
– Define	
  the	
  vocabulary	
  and	
  graph	
  structure	
  of	
  the	
  
RDF	
  serializa3on	
  
– In	
  Turtle,	
  RDF/XML	
  
103	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Typical	
  steps:	
  
•  Define	
  a	
  reconcilia3on	
  service	
  
•  Select	
  specific	
  types	
  to	
  reconcile	
  against	
  
•  Start	
  reconciling	
  a	
  column	
  against	
  the	
  
service	
  
104	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Entity	
  Reconciliation	
  
Define	
  RDF	
  Skeletons	
  
•  An	
  RDF	
  skeleton	
  defines	
  the	
  structure	
  of	
  the	
  
RDF	
  triples	
  that	
  are	
  exported	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   105	
  
RDF	
  Skeletons	
  
03.09.13	
   106	
  106	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
EXTRACTING	
  DATA	
  FROM	
  
RDBMS	
  WITH	
  R2RML	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   107	
  
W3C	
  RDB2RDF	
  
•  Task:	
  Integrate	
  data	
  from	
  
rela3onal	
  DBMS	
  with	
  Linked	
  
Data	
  
•  Approach:	
  map	
  from	
  
rela3onal	
  schema	
  to	
  seman3c	
  
vocabulary	
  with	
  R2RML	
  
•  Publishing:	
  two	
  alterna3ves	
  –	
  
–  Translate	
  SPARQL	
  into	
  
SQL	
  on	
  the	
  fly	
  
–  Batch	
  transform	
  data	
  into	
  
RDF,	
  index	
  and	
  provide	
  
SPARQL	
  access	
  in	
  a	
  
triplestore	
  
108	
  
LD	
  Data	
  set	
  Access	
  
Integrated	
  
Data	
  in	
  
Triplestore	
  
Interlinking	
   Cleansing	
  
Vocabulary	
  
Mapping	
  
SPARQL	
  
Endpoint	
  
Publishing	
  
Data	
  acquisi3on	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
R2RML	
  
Engine	
  
Rela3onal	
  
DBMS	
  
W3C	
  RDB2RDF	
  
•  The	
  W3C	
  made,	
  last	
  year,	
  two	
  recommenda3ons	
  for	
  
mapping	
  between	
  rela3onal	
  databases	
  and	
  RDF:	
  
–  Direct	
  mapping	
  directly	
  exposes	
  data	
  as	
  RDF	
  
•  Not	
  allowance	
  for	
  vocabulary	
  	
  mapping	
  
•  No	
  allowance	
  for	
  interlinking	
  (unless	
  URIs	
  used	
  in	
  rela3onal	
  data)	
  
•  Not	
  appropriate	
  for	
  this	
  topic	
  
– R2RML,	
  the	
  RDB	
  to	
  RDF	
  mapping	
  language	
  
•  Allows	
  vocabulary	
  mapping	
  (subject,	
  predicate	
  and	
  
object	
  maps	
  with	
  class	
  op3ons)	
  
•  Allows	
  interlinking	
  –	
  URIs	
  can	
  be	
  constructed	
  
•  Means	
  to	
  provide	
  MusicBrainz	
  RDF/SPARQL	
  itself	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   109	
  
hjp://www.w3.org/2001/sw/rdb2rdf/	
  
MusicBrainz	
  Next	
  Gen	
  Schema	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   110	
  
•  Ar9st	
  
	
  As	
  pre-­‐NGS,	
  but	
  	
  	
  
	
  	
  	
  	
  further	
  ajributes	
  
•  Ar9st	
  Credit	
  
	
  Allows	
  joint	
  credit	
  
•  Release	
  Group	
  
	
  Cf.	
  ‘album’	
  	
  
	
  	
  	
  	
  versus:	
  
•  Release	
  
•  Medium	
  	
  
	
  
	
  
•  Track	
  
•  Track	
  List	
  
•  Work	
  
•  Recording	
  
Source:	
  hjps://wiki.musicbrainz.org/Next_Genera3on_Schema	
  
Music	
  Ontology	
  
•  OWL	
  ontology	
  with	
  following	
  core	
  concepts	
  
(classes)	
  and	
  rela3onships	
  (proper3es):	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   111	
  
Source:	
  hjp://musicontology.com	
  
R2RML	
  Class	
  Mapping	
  
•  Mapping	
  tables	
  to	
  classes	
  is	
  ‘easy’:	
  
lb:Artist	
  a	
  rr:TriplesMap	
  ;	
  
	
  	
  rr:logicalTable	
  [rr:tableName	
  "artist"]	
  ;	
  
	
  	
  rr:subjectMap	
  	
  
	
  	
  	
  	
  [rr:class	
  mo:MusicArtist	
  ;	
  
	
  	
  	
  	
  	
  rr:template	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  "http://musicbrainz.org/artist/{gid}#_"]	
  ;	
  
	
  	
  rr:predicateObjectMap	
  	
  
	
  	
  	
  	
  [rr:predicate	
  mo:musicbrainz_guid	
  ;	
  
	
  	
  	
  	
  	
  rr:objectMap	
  [rr:column	
  "gid"	
  ;	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  rr:datatype	
  xsd:string]]	
  .	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   112	
  
R2RML	
  Property	
  Mapping	
  
•  Mapping	
  columns	
  to	
  proper3es	
  can	
  be	
  easy:	
  
lb:artist_name	
  a	
  rr:TriplesMap	
  ;	
  
	
  	
  rr:logicalTable	
  [rr:sqlQuery	
  	
  
	
  	
  	
  	
  """SELECT	
  artist.gid,	
  artist_name.name	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  FROM	
  artist	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  INNER	
  JOIN	
  artist_name	
  ON	
  artist.name	
  =	
  
artist_name.id"""]	
  ;	
  
	
  	
  rr:subjectMap	
  lb:sm_artist	
  ;	
  
	
  	
  rr:predicateObjectMap	
  	
  
	
  	
  	
  	
  [rr:predicate	
  foaf:name	
  ;	
  
	
  	
  	
  	
  	
  rr:objectMap	
  [rr:column	
  "name"]]	
  .	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   113	
  
NGS	
  Advanced	
  Relations	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   114	
  
•  Major	
  en33es	
  (Ar3st,	
  Release	
  Group,	
  Track,	
  etc.)	
  plus	
  
URL	
  are	
  paired	
  
	
  (l_ar3st_ar3st)	
  
•  Each	
  pairing	
  
	
  of	
  instances	
  
	
  refers	
  to	
  a	
  Link	
  
•  Links	
  have	
  types	
  	
  
	
  (cf.	
  RDF	
  proper3es)	
  
	
  and	
  ajributes	
  
	
  
	
  	
  
Source:	
  hjp://wiki.musicbrainz.org/Advanced_Rela3onship	
  
R2RML	
  Advanced	
  Mapping	
  
•  Mapping	
  advanced	
  rela3onships	
  (SQL	
  joins):	
  
lb:artist_member	
  a	
  rr:TriplesMap	
  ;	
  
	
  	
  rr:logicalTable	
  [rr:sqlQuery	
  
	
  	
  	
  	
  """SELECT	
  a1.gid,	
  a2.gid	
  AS	
  band	
  
	
  	
  	
  	
  	
  	
  	
  FROM	
  artist	
  a1	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  INNER	
  JOIN	
  l_artist_artist	
  ON	
  a1.id	
  =	
  
l_artist_artist.entity0	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  INNER	
  JOIN	
  link	
  ON	
  l_artist_artist.link	
  =	
  link.id	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  INNER	
  JOIN	
  link_type	
  ON	
  link_type	
  =	
  link_type.id	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  INNER	
  JOIN	
  artist	
  a2	
  on	
  l_artist_artist.entity1	
  =	
  a2.id	
  	
  
	
  	
  	
  	
  	
  	
  	
  WHERE	
  link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821'	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  AND	
  link.ended=FALSE"""]	
  ;	
  
	
  	
  rr:subjectMap	
  lb:sm_artist	
  ;	
  
	
  	
  rr:predicateObjectMap	
  	
  
	
  	
  	
  	
  [rr:predicate	
  mo:member_of	
  ;	
  
	
  	
  	
  	
  	
  rr:objectMap	
  [rr:template	
  "http://musicbrainz.org/artist/
{band}#_"	
  ;	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  rr:termType	
  rr:IRI]]	
  .	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   115	
  
EXTRACTING	
  DATA	
  FROM	
  TEXT	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   116	
  
OpenCalais	
  
	
  
	
  
•  Not	
  easily	
  customised/extended	
  
•  Domain-­‐specific	
  coverage	
  varies	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   117	
  
Source:	
  hjp://viewer.opencalais.com/	
  
DBpedia	
  Spotlight	
  	
  
	
  
	
  
•  Not	
  easily	
  customised/extended	
  
•  Is	
  currently	
  only	
  available	
  for	
  English	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   118	
  
Source:	
  hjp://dbpedia-­‐spotlight.github.com/demo/	
  
hjp://dbpedia.org/page/Slowcore	
  
hjp://dbpedia.org/page/Dorothy_Parker	
  
Zemanta	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   119	
  
Source:	
  hjp://www.zemanta.com/demo/	
  
•  Common	
  problem	
  with	
  general	
  purpose,	
  open-­‐domain	
  	
  	
  
	
  seman3c	
  	
  
	
  annota3on	
  	
  
	
  tools	
  
•  Best	
  results	
  	
  
	
  require	
  	
  
	
  bespoke	
  	
  
	
  customisa3on	
  
•  General	
  Architecture	
  for	
  Text	
  Engineering	
  
•  Free	
  open-­‐source	
  (LGPL)	
  
	
  framework	
  and	
  development	
  environment	
  
•  Started	
  1996,	
  large	
  developer	
  community	
  
•  Used	
  worldwide	
  by	
  many	
  organisa3ons	
  to	
  
build	
  bespoke	
  solu3ons;	
  e.g.	
  Press	
  Associa3on	
  
and	
  the	
  Na3onal	
  Archive	
  
•  Informa3on	
  Extrac3on	
  in	
  many	
  languages	
  
GATE	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   120	
  
hjp://www.gate.ac.uk/	
  
•  Increases	
  recall	
  over	
  DBpedia	
  by	
  deriving	
  new	
  
lexicalisa3ons	
  for	
  URIs	
  from	
  link	
  anchor	
  texts,	
  
disambigua3on	
  pages,	
  and	
  redirect	
  pages	
  
GATE	
  Example	
  -­‐	
  LODIE	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   121	
  
Precision	
  and	
  Recall	
  
•  Generic	
  services	
  typically	
  very	
  low	
  recall	
  
•  Combina3on	
  is	
  one	
  solu3on	
  
•  Other	
  solu3on	
  is	
  custom	
  extrac3on	
  
122	
  
PER LOC ORG TOTAL
DB	
  Spotlight 0.97	
  /	
  0.40 0.82	
  /	
  0.46 0.86	
  /	
  0.31 0.85	
  /	
  0.39
Zemanta 0.96	
  /	
  0.84 0.89	
  /	
  0.62 0.82	
  /	
  0.57 0.90	
  /	
  0.68
LODIE 0.81	
  /	
  0.82 0.73	
  /	
  0.76 0.56	
  /	
  0.59 0.71	
  /	
  0.74
Zemanta	
  ∩	
  LODIE 1.00	
  /	
  0.74 0.95	
  /	
  0.45 0.97	
  /	
  0.42 0.97	
  /	
  0.54
Zemanta	
  U	
  LODIE 0.94	
  /	
  0.93 0.77	
  /	
  0.76 0.72	
  /	
  0.71 0.82	
  /	
  0.81
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Custom	
  GATE	
  Gazetteer	
  
•  Retrieve	
  MusicBrainz	
  
	
  en3ty/label/class	
  	
  
	
  with	
  SPARQL	
  query	
  
123	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
GATECloud	
  
•  Custom	
  (e.g.	
  based	
  around	
  custom	
  gazejeer)	
  
GATE	
  pipelines	
  can	
  be	
  executed	
  on	
  the	
  cloud:	
  
124	
  EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
INTERLINKING	
  DATA	
  SETS	
  WITH	
  
SILK	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   125	
  
Interlinking	
  with	
  Silk	
  
•  Task:	
  Create	
  links	
  between	
  
the	
  data	
  set	
  and	
  external	
  
Linked	
  Data	
  sources.	
  
•  Approach:	
  Crea3on	
  of	
  
specified	
  links	
  by	
  querying	
  
the	
  target	
  data	
  sets	
  
•  Alterna9ves:	
  
–  Manual	
  crea3on	
  of	
  
linkage	
  rules	
  by	
  the	
  user	
  	
  
–  Automa3c	
  learning	
  
linkage	
  rules	
  by	
  
submi…ng	
  predefined	
  
SPARQL	
  queries	
  
126	
  
LD	
  Data	
  set	
  Access	
  
Integrated	
  
Data	
  Set	
  
Interlinking	
   Cleansing	
  
Vocabulary	
  
Mapping	
  
SPARQL	
  
Endpoint	
  
Publishing	
  
CSV/	
  
TSV	
  
HTML	
   Spreadsheets	
   JSON	
  
Data	
  acquisi3on	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
  
Link	
  Discovery	
  with	
  Silk	
  
•  Open	
  source	
  tool	
  for	
  discovering	
  RDF	
  links	
  between	
  
data	
  items	
  within	
  different	
  Linked	
  Data	
  sources	
  
•  It	
  is	
  based	
  on	
  the	
  Silk	
  Link	
  Specifica3on	
  Language	
  
(Silk-­‐LSL)	
  for	
  expressing	
  linkage	
  rules	
  
•  It	
  accesses	
  the	
  target	
  RDF	
  data	
  sets	
  via	
  SPARQL	
  
endpoints	
  to	
  generate	
  RDF	
  links	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   127	
  
Source:	
  Robert	
  Isele.	
  “LOD2	
  Webinar	
  Series:Silk”	
  	
  
Silk	
  Variants	
  
•  Silk	
  Single	
  Machine	
  
•  Generates	
  RDF	
  links	
  on	
  a	
  single	
  machine	
  
•  Data	
  sets	
  can	
  reside	
  either	
  locally	
  or	
  in	
  remote	
  machines	
  
•  Provides	
  mul3threading	
  and	
  caching	
  
•  Silk	
  MapReduce	
  
•  Uses	
  a	
  cluster	
  composed	
  of	
  mul3ple	
  machines	
  
•  Based	
  on	
  Hadoop	
  and	
  designed	
  to	
  scale	
  to	
  big	
  data	
  sets	
  	
  
•  Silk	
  Server	
  
•  Used	
  within	
  applica3ons	
  that	
  consume	
  Linked	
  Data	
  from	
  the	
  Web	
  
while	
  keeping	
  track	
  of	
  known	
  en33es	
  	
  
•  Provides	
  an	
  HTTP	
  API	
  for	
  matching	
  en33es	
  from	
  an	
  incoming	
  
stream	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   128	
  
Source:	
  hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/	
  
Source:	
  Silk	
  workflow	
  is	
  par3ally	
  based	
  on	
  “LOD2	
  Webinar	
  Series:	
  Silk	
  -­‐(Simplified)	
  
Linking	
  Workflow”	
  by	
  Rober	
  Isele.	
  	
  	
  
	
  	
  
Silk	
  Workflow	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   129	
  
Select	
  LD	
  
data	
  sets	
  
•  Iden3fy	
  
suitable	
  data	
  
sets	
  in	
  LD	
  
catalogs*	
  
•  Select	
  the	
  two	
  
data	
  sets	
  to	
  
link	
  
Specify	
  LD	
  
data	
  sets	
  
•  Specify	
  the	
  
access	
  method	
  
to	
  the	
  data	
  set	
  
(RDF	
  dump,	
  
SPARQL	
  
endpoint)*	
  
•  Specify	
  the	
  
en3ty	
  types	
  to	
  
be	
  linked	
  
Write	
  
linkage	
  rule	
  
•  Specifies	
  how	
  
to	
  compare	
  
the	
  resources	
  
•  Use	
  Silk-­‐LSL	
  
•  The	
  rules	
  can	
  
also	
  be	
  learnt	
  	
  
Generate	
  
RDF	
  links	
  
•  Output	
  links	
  
can	
  be	
  stored	
  
in	
  a	
  file	
  or	
  a	
  
triple	
  store	
  
•  Can	
  discover	
  
SKOS	
  links	
  
Silk	
  framework	
  
*	
  See	
  sec3on	
  “Publishing	
  Linked	
  Data”	
  	
  
Linkage	
  Rule	
  Components	
  
•  Linkage	
  rules	
  define	
  the	
  condi3ons	
  to	
  create	
  the	
  links	
  
between	
  the	
  data	
  sets.	
  These	
  rules	
  are	
  composed	
  of:	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   130	
  
Source:	
  hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/	
  
RDF	
  Paths	
  
•  Describe	
  the	
  elements	
  to	
  be	
  
compared	
  
•  Example:	
  ?a/rdfs:label	
  
	
  
Transforma9ons	
  
•  Apply	
  transforma3ons	
  to	
  the	
  
result	
  set	
  of	
  an	
  RDF	
  path	
  
•  Examples:	
  LowerCase,	
  
Concatenate,	
  Replace,	
  …	
  
Comparators	
  
•  Compute	
  the	
  similarity	
  of	
  two	
  
inputs	
  
•  Examples:	
  String	
  similarity	
  
metrics,	
  Date	
  similarity,	
  …	
  
	
  
Aggrega9ons	
  
•  Compute	
  an	
  aggregated	
  value	
  
from	
  mul3ple	
  comparators	
  
•  Examples:	
  Min,	
  Max,	
  Avg,	
  various	
  
means,	
  Euclidian	
  distance	
  …	
  
1	
   2	
  
3	
   4	
  
Silk	
  Workbench	
  
•  Web	
  applica3on	
  built	
  on	
  top	
  of	
  Silk,	
  which	
  allows	
  the	
  
crea3on	
  of	
  projects	
  to	
  manage	
  the	
  crea3on	
  of	
  links	
  
between	
  RDF	
  data	
  sets	
  
•  The	
  data	
  sets	
  can	
  be	
  stored	
  locally	
  or	
  accessed	
  
remotely	
  by	
  specifying	
  the	
  SPARQL	
  endpoint	
  
•  The	
  user	
  is	
  able	
  to	
  create	
  customized	
  linking	
  tasks:	
  
–  The	
  tool	
  offers	
  a	
  graphical	
  editor	
  to	
  create	
  linkage	
  rules	
  by	
  
combining	
  the	
  linkage	
  rules	
  components	
  via	
  drag	
  &	
  drop	
  
elements	
  
–  Includes	
  support	
  for	
  (automa3c)	
  learning	
  linkage	
  rules	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   131	
  
Project	
  configuration	
  
	
  
	
  
	
  
	
  
	
  
Silk	
  Workbench	
  (2)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   132	
  
1	
  
2	
  
3	
  
4	
  
1.  Project:	
  name	
  and	
  components	
  (data	
  
sources,	
  	
  linking	
  tasks	
  and	
  output	
  tasks)	
  
2.  Data	
  sources:	
  specifica3on	
  of	
  the	
  data	
  
sets	
  to	
  be	
  interlinked	
  
3.  Linking	
  task:	
  specifica3on	
  of	
  the	
  linkage	
  
rules	
  and	
  type	
  of	
  links	
  to	
  be	
  created	
  
4.  Output	
  task:	
  mechanism	
  to	
  store	
  the	
  
results	
  from	
  the	
  lnking	
  process	
  
2	
  
Editing	
  a	
  linking	
  task	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
Silk	
  Workbench	
  (3)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   133	
  
1	
  
4	
  
2	
  
3	
  
1.  Linkage	
  rule	
  components	
  	
  
2.  Graphical	
  editor:	
  the	
  items	
  from	
  (1)	
  are	
  dragged	
  &	
  
dropped	
  in	
  this	
  area,	
  and	
  connected	
  to	
  compose	
  the	
  
linkage	
  rules	
  	
  
3.  Generate	
  links:	
  based	
  on	
  the	
  defined	
  linkage	
  rules	
  in	
  (2),	
  
the	
  data	
  sets	
  are	
  accessed	
  to	
  discover	
  possible	
  links	
  
4.  Learn:	
  automa3c	
  learning	
  of	
  linkage	
  rules	
  
Adding	
  a	
  linkage	
  rule	
  	
  
	
  
	
  
	
  
	
  
	
  
	
  
Silk	
  Workbench	
  (4)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   134	
  
The	
  previous	
  linkage	
  rule	
  states:	
  
1.  Retrieve	
  the	
  foaf:name	
  values	
  from	
  MusicBrainz	
  and	
  
the	
  rdfs:label	
  from	
  DBpedia	
  
2.  Apply	
  lower	
  case	
  transforma3on	
  to	
  the	
  output	
  of	
  (1)	
  
3.  Compare	
  the	
  output	
  from	
  (2)	
  using	
  the	
  metric	
  
“Levenshtein	
  distance”.	
  If	
  this	
  distance	
  is	
  greater	
  than	
  
0.90,	
  then	
  create	
  a	
  link.	
  
1	
   2	
  
3	
  
Generate	
  Links	
  
	
  
	
  
	
  
	
  
	
  
Silk	
  Workbench	
  (5)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   135	
  
Learn	
  Rules	
  
	
  
	
  
	
  
	
  
	
  
	
  
Silk	
  Workbench	
  (6)	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   136	
  
For	
  exercises,	
  quiz	
  and	
  further	
  material	
  visit	
  our	
  website:	
  
	
  
EUCLID	
  -­‐	
  Providing	
  Linked	
  Data	
   137	
  
@euclid_project	
   euclidproject	
   euclidproject	
  
http://www.euclid-­‐project.eu	
  
Other	
  channels:	
  
eBook	
   Course	
  

Más contenido relacionado

La actualidad más candente

Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemNIT Durgapur
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Morgan Briles
 
Unlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataUnlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataJoel Richard
 
Reuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and RealizationReuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and Realizationandrea huang
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on AndroidEUCLID project
 
MR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionMR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionTakeshi Morita
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Researchadameq
 
TripFS presentation at ldow 2010
TripFS presentation at ldow 2010TripFS presentation at ldow 2010
TripFS presentation at ldow 2010Niko Popitsch
 

La actualidad más candente (20)

Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Linking up your data
Linking up your dataLinking up your data
Linking up your data
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Semantic Web and Linked Open Data
Semantic Web and Linked Open DataSemantic Web and Linked Open Data
Semantic Web and Linked Open Data
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web Linked data 101: Getting Caught in the Semantic Web
Linked data 101: Getting Caught in the Semantic Web
 
Unlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open DataUnlocking Taxonomic Literature II using Linked Open Data
Unlocking Taxonomic Literature II using Linked Open Data
 
Reuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and RealizationReuse of Structured Data: Semantics, Linkage, and Realization
Reuse of Structured Data: Semantics, Linkage, and Realization
 
Querying Linked Data on Android
Querying Linked Data on AndroidQuerying Linked Data on Android
Querying Linked Data on Android
 
MR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision ReflectionMR^3: Meta-Model Management based on RDFs Revision Reflection
MR^3: Meta-Model Management based on RDFs Revision Reflection
 
Corrib.org - OpenSource and Research
Corrib.org - OpenSource and ResearchCorrib.org - OpenSource and Research
Corrib.org - OpenSource and Research
 
TripFS presentation at ldow 2010
TripFS presentation at ldow 2010TripFS presentation at ldow 2010
TripFS presentation at ldow 2010
 

Destacado

Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataeswcsummerschool
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringeswcsummerschool
 
Syrtaki - ESWC SSchool 14 - Student project
Syrtaki  - ESWC SSchool 14 - Student projectSyrtaki  - ESWC SSchool 14 - Student project
Syrtaki - ESWC SSchool 14 - Student projecteswcsummerschool
 
Semantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projectSemantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projecteswcsummerschool
 
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projectArabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projecteswcsummerschool
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataeswcsummerschool
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorialeswcsummerschool
 

Destacado (8)

Mon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked dataMon fundulaki tut_querying linked data
Mon fundulaki tut_querying linked data
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Fri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineeringFri schreiber key_knowledge engineering
Fri schreiber key_knowledge engineering
 
Syrtaki - ESWC SSchool 14 - Student project
Syrtaki  - ESWC SSchool 14 - Student projectSyrtaki  - ESWC SSchool 14 - Student project
Syrtaki - ESWC SSchool 14 - Student project
 
Semantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student projectSemantic Aquarium - ESWC SSchool 14 - Student project
Semantic Aquarium - ESWC SSchool 14 - Student project
 
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student projectArabic Sentiment Lexicon - ESWC SSchool 14 - Student project
Arabic Sentiment Lexicon - ESWC SSchool 14 - Student project
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
 
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data TutorialESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
ESWC SS 2012 - Friday Keynote Marko Grobelnik: Big Data Tutorial
 

Similar a ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museumsljsmart
 
Linked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and MuseumsLinked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and Museumstrevorthornton
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Richard Urban
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Dr. Haxel Consult
 
Linked Data for African Libraries
Linked Data for African LibrariesLinked Data for African Libraries
Linked Data for African LibrariesGetaneh Alemu
 
One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebVictor de Boer
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparqlDhavalkumar Thakker
 

Similar a ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data (20)

Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Linked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, MuseumsLinked data for Libraries, Archives, Museums
Linked data for Libraries, Archives, Museums
 
Linked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and MuseumsLinked Open Data Fundamentals for Libraries, Archives and Museums
Linked Open Data Fundamentals for Libraries, Archives and Museums
 
Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1 Publishing and Using Linked Open Data - Day 1
Publishing and Using Linked Open Data - Day 1
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Linked Data
Linked DataLinked Data
Linked Data
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Linked Data for African Libraries
Linked Data for African LibrariesLinked Data for African Libraries
Linked Data for African Libraries
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
Web of Data Usage Mining
Web of Data Usage MiningWeb of Data Usage Mining
Web of Data Usage Mining
 
One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic Web
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Technical Background
Technical BackgroundTechnical Background
Technical Background
 
Linked Data
Linked DataLinked Data
Linked Data
 
Biodiversity Informatics on the Semantic Web
Biodiversity Informatics on the Semantic WebBiodiversity Informatics on the Semantic Web
Biodiversity Informatics on the Semantic Web
 
Lecture linked data cloud & sparql
Lecture linked data cloud & sparqlLecture linked data cloud & sparql
Lecture linked data cloud & sparql
 
What is Linked Data?
What is Linked Data?What is Linked Data?
What is Linked Data?
 

Más de eswcsummerschool

Keep fit (a bit) - ESWC SSchool 14 - Student project
Keep fit (a bit)  - ESWC SSchool 14 - Student projectKeep fit (a bit)  - ESWC SSchool 14 - Student project
Keep fit (a bit) - ESWC SSchool 14 - Student projecteswcsummerschool
 
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projectFIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projecteswcsummerschool
 
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
Personal Tours at the British Museum  - ESWC SSchool 14 - Student projectPersonal Tours at the British Museum  - ESWC SSchool 14 - Student project
Personal Tours at the British Museum - ESWC SSchool 14 - Student projecteswcsummerschool
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...eswcsummerschool
 
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...eswcsummerschool
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 eswcsummerschool
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014eswcsummerschool
 
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 eswcsummerschool
 
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...eswcsummerschool
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01eswcsummerschool
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the schooleswcsummerschool
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataeswcsummerschool
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataeswcsummerschool
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speedeswcsummerschool
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semanticeswcsummerschool
 
Wed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationsWed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationseswcsummerschool
 
Wed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservationWed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservationeswcsummerschool
 
Wed van horik_handson_research data management
Wed van horik_handson_research data managementWed van horik_handson_research data management
Wed van horik_handson_research data managementeswcsummerschool
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
Wed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservationsWed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservationseswcsummerschool
 

Más de eswcsummerschool (20)

Keep fit (a bit) - ESWC SSchool 14 - Student project
Keep fit (a bit)  - ESWC SSchool 14 - Student projectKeep fit (a bit)  - ESWC SSchool 14 - Student project
Keep fit (a bit) - ESWC SSchool 14 - Student project
 
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student projectFIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
FIT-8BIT An activity music assistant - ESWC SSchool 14 - Student project
 
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
Personal Tours at the British Museum  - ESWC SSchool 14 - Student projectPersonal Tours at the British Museum  - ESWC SSchool 14 - Student project
Personal Tours at the British Museum - ESWC SSchool 14 - Student project
 
Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...Exhibition recommendation using British Museum data and Event Registry - ESWC...
Exhibition recommendation using British Museum data and Event Registry - ESWC...
 
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
Empowering fishing business using Linked Data - ESWC SSchool 14 - Student pro...
 
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014 Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
Tutorial: Social Semantic Web and Crowdsourcing - E. Simperl - ESWC SS 2014
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014 Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
Hands On: Amazon Mechanical Turk - M. Acosta - ESWC SS 2014
 
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
Tutorial: Querying a Marine Data Warehouse Using SPARQL - I. Fundulaki - ESWC...
 
Mon norton tut_publishing01
Mon norton tut_publishing01Mon norton tut_publishing01
Mon norton tut_publishing01
 
Mon domingue introduction to the school
Mon domingue introduction to the schoolMon domingue introduction to the school
Mon domingue introduction to the school
 
Mon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage dataMon norton tut_querying cultural heritage data
Mon norton tut_querying cultural heritage data
 
Tue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddataTue acosta hands_on_providinglinkeddata
Tue acosta hands_on_providinglinkeddata
 
Thu bernstein key_warp_speed
Thu bernstein key_warp_speedThu bernstein key_warp_speed
Thu bernstein key_warp_speed
 
Mon domingue key_introduction to semantic
Mon domingue key_introduction to semanticMon domingue key_introduction to semantic
Mon domingue key_introduction to semantic
 
Wed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservationsWed batsakis tut_challenges of preservations
Wed batsakis tut_challenges of preservations
 
Wed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservationWed garcia hands_on_d_bpedia preservation
Wed garcia hands_on_d_bpedia preservation
 
Wed van horik_handson_research data management
Wed van horik_handson_research data managementWed van horik_handson_research data management
Wed van horik_handson_research data management
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
Wed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservationsWed batsakis tut_chalasdlenges of preservations
Wed batsakis tut_chalasdlenges of preservations
 

Último

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 

Último (20)

Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 

ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data

  • 1. Providing  Linked  Data   Presented  by:   Barry  Norton   Maribel  Acosta  
  • 2. Motivation:  Music!   2   Visualiza3on   Module   Metadata   Streaming  providers   Physical  Wrapper   Downloads   Data  acquisi3on   R2R  Transf.  LD  Wrapper   Musical  Content   Applica3on   Analysis  &   Mining  Module   LD  Data  set  Access   LD  Wrapper   RDF/   XML   Integrated   Dataset   Interlinking   Cleansing   Vocabulary   Mapping   SPARQL   Endpoint   Publishing   RDFa   Other  content  
  • 3. LINKED  DATA  LIFECYCLE   EUCLID  -­‐  Querying  Linked  Data   3  
  • 4. Linked  Data  Principles   1.  Use  URIs  as  names  for  things.   2.  Use  HTTP  URIs  so  that  users  can  look  up   those  names.   3.  When  someone  looks  up  a  URI,  provide   useful  informa9on,  using  the  standards   (RDF*,  SPARQL).   4.  Include  links  to  other  URIs,  so  that  users   can  discover  more  things.   EUCLID  -­‐  Providing  Linked  Data   4   CH  1  
  • 5. Linked  Data   Lifecycle   Linked  Data  Lifecycle   EUCLID  -­‐  Providing  Linked  Data   5   Source:  Sören  Auer.  “The  Seman3c  Data  Web”  (slides)   Source:  José  M.  Alvarez.  “My  Linked  Data  Lifecycle”   Source:  Michael  Hausenblas.  “Linked  Data  lifeyclcle”  
  • 6. Core  Tasks  for  Providing             Linked  Data     EUCLID  -­‐  Providing  Linked  Data   6   Based  on  the  proposed  LD  lifecycles  and  the  LD   principles,  we  can  iden3fy  3  main  tasks  for  providing  LD:   ① Crea9ng:  includes  data  extrac3on,  crea3on  of  HTTP   URIs,  and  vocabulary  selec3on.  (LD  principles  1  &  2)   ② Interlinking:  involves  the  crea3on  of  (RDF)  links  to   external  data  sets.  (LD  principle  4)   ③ Publishing:  consists  of  crea3ng  the  metadata  and   making  the  data  set  accessible.  (LD  principle  3)    
  • 7. Agenda   1.  Crea9ng  Linked  Data   2.  Interlinking  Linked  Data   3.  Publishing  Linked  Data   4.  Linked  Data  publishing  checklist   7  EUCLID  -­‐  Providing  Linked  Data  
  • 8. CREATING  LINKED  DATA   EUCLID  -­‐  Querying  Linked  Data   8  
  • 9. •  The  data  of  interest  may  be  stored  in  a  wide  range  or   formats:     •  Several  tools  support  the  process  of  mining  data   from  different  repositories,  for  example:   Extracting  the  Data   9  EUCLID  -­‐  Providing  Linked  Data   Spreadsheets   or  tabular  data     Databases   Text   R2RML  
  • 10. Using  the  RDF  Data  Model   EUCLID  -­‐  Providing  Linked  Data   10   •  The  RDF  data  model  is  used  to  represent  the   extracted  informa3on   •  The  nodes  represent  the  concepts/en33es  within   the  data.  A  node  corresponds  to  a  URI,  a  blank  node   or  a  literal  (only  in  predicates)   •  The  rela3onships  between  the  concepts/en33es  are   modeled  as  arcs   Subject   Object   Predicate  
  • 11. Naming  Things:  URIs   •  All  the  things  or  dis3nct  en33es  within  the  data  must   be  named   •  According  to  the  Linked  Data  principles,  the  standard   mechanism  to  name  en33es  is  the  URI   •  Designing  Cool  URIs:   –  Leave  out  informa3on  about  the  data  regarding  to:  author,   technologies,  status,  access  mechanisms,  …   –  Simplicity:  short,  mnemonic  URIs   –  Stability:  maintain  the  URIs  as  long  as  possible   –  Manageability:  issue  the  URIs  in  a  way  that  you  can  manage   11  EUCLID  -­‐  Providing  Linked  Data   Source:hjp://www.w3.org/TR/cooluris/  
  • 12. Selecting  Vocabularies   •  Vocabularies    model  the  concepts  and  the   rela9onship  between  them  in  a  knowledge  domain   •  Terms  from  well-­‐known  vocabularies  should  be   reused  wherever  possible   •  New  terms  should  be  define  only  if  you  can  not  find   required  terms  in  exis3ng  vocabularies   •  A  large  number  of  vocabularies  in  RDF  are  openly   available,  e.g.,  Linked  Open  Vocabularies  (LOV)   12  EUCLID  -­‐  Providing  Linked  Data  
  • 13. Selecting  Vocabularies  (2)   EUCLID  -­‐  Providing  Linked  Data   13   Linked  Open  Vocabularies   322  vocabularies   classified  by  domain   Source:hjp://lov.okfn.org/dataset/lov/  
  • 14. Selecting  Vocabularies  (3)   EUCLID  -­‐  Providing  Linked  Data   14   Linked  Open  Vocabularies:  Analyzing  MusicOntology   Source:hjp://lov.okfn.org/dataset/lov/details/vocabulary_mo.html  
  • 15. Selecting  Vocabularies  (4)   EUCLID  -­‐  Providing  Linked  Data   15   Other  lists  of  well-­‐known  vocabularies  are  maintained   by:   •  W3C  SWEO  Linking  Open  Data  community  project   hjp://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/ CommonVocabularies   •  Library  Linked  Data  Incubator  Group:  Vocabularies  in   the  library  domain   hjp://www.w3.org/2005/Incubator/lld/XGR-­‐lld-­‐vocabdataset-­‐20111025  
  • 16. INTERLINKING  LINKED  DATA   EUCLID  -­‐  Providing  Linked  Data   16  
  • 17. Interlinking  Data  Sets   •  It’s  one  of  the  Linked  Data  principles!   •  Involves  the  crea3on  of  RDF  links  between  two   different  RDF  data  sets:   –  Links  at  instance  level  (rdfs:seeAlso,  owl:sameAs)   –  Links  at  schema  level  (RDFS  subclass/subproperty,  OWL   equivalent  class/property,  SKOS  mapping  proper9es)   •  Appropriate  links  are  detected  via  link  discovery   EUCLID  -­‐  Providing  Linked  Data   17   4.  Include  links  to  other  URIs,  so  that  users  can  discover   more  things.  
  • 18. Interlinking  Data  Sets  (2)   Challenges  for  link  discovery   •  Linked  Data  sets  are  heterogeneous  in  terms  of   vocabularies,  formats  and  data  representa3on   •  Large  range  of  knowledge  domains     •  Scalability:  LD  is  composed  of  a  large  number  of  data   sets  and  RDF  triples,  hence  it  is  not  possible  to   compare  every  possible  en3ty  pair   EUCLID  -­‐  Providing  Linked  Data   18   Source:  Robert  Isele.  “LOD2  Webinar  Series:Silk”    
  • 19. Interlinking  Data  Sets  (3)   Challenges  for  link  discovery   •  It  corresponds  to  the  en9ty  resolu9on  problem:   deciding  whether  two  en..es  correspond  to  same  object  in   the  real  world   •  Name  ambigui9es:  typos,  misspellings,  different   languages,  homonyms     •  Structural  ambigui9es:  same  concepts/en33es  with   different  structures.  Requires  the  applica3on  of  ontology   and  schema  matching  techniques   EUCLID  -­‐  Providing  Linked  Data   19  
  • 20. Interlinking  Data  Sets  (4)   EUCLID  -­‐  Providing  Linked  Data   20   RDF  data  sets     can  be  interlinked:   Manually   •  Involves  the  manual  explora3on  of   LD  data  sets  and  their  RDF   resources  to  iden3fy  linking  targets   •  May  not  be  feasible  when  the   number  of  en33es  within  the  data   set  is  very  large     Automatically   •  Using  tools  that  perform  link   discovery  based  on  linkage  rules,  for   example:  Silk,  Limes  and  xCurator  
  • 21. owl:sameAs  &  rdfs:seeAlso   •  owl:sameAs   •  Creates  links  between  individuals     •  States  that  two  URIs  refer  to  the  same  individuals     •  rdfs:seeAlso   •  States  that  a  resource  may  provide  addi3onal  informa3on   about  the  subject  resource   •  Links  in  MusicBrainz:   –  owl:seeAlso  is  used  for  music  ar3sts   –  rdfs:seeAlso  is  used  for  albums   EUCLID  -­‐  Providing  Linked  Data   21  
  • 22. SKOS   •  Simple  Knowledge  Organiza3on  System   –  hjp://www.w3.org/TR/skos-­‐reference/     •  Data  model  for  knowledge  organiza3on  systems   (thesauri,  classifica3on  scheme,  taxonomies)     •  SKOS  data  is  expressed  as  RDF  triples   •  Allows  the  crea3on  of  RDF  links  between  different   data  sets  with  the  usage  of  mapping  proper9es   EUCLID  -­‐  Providing  Linked  Data   22  
  • 23. SKOS:  Mapping  Properties   These  proper3es  are  used  to  link  SKOS  concepts   (par3cularly  instances)  in  different  schemes:   •  skos:closeMatch:  links  two  concepts  that  are   sufficiently  similar  (some3mes  can  be  used  interchangeably)   •  skos:exactMatch:  indicates  that  the  two  concepts   can  be  used  interchangeably.     •  Axiom:  It  is  a  transi9ve  property   •  skos:relatedMatch:  states  an  associa3ve  mapping   link  between  two  concepts   EUCLID  -­‐  Providing  Linked  Data   23  
  • 24. Example  of  SKOS  exact  match       SKOS:  Mapping  Properties  (2)   EUCLID  -­‐  Providing  Linked  Data   24   mo:MusicArtist  skos:exactMatch  dbpedia-­‐ont:MusicalArtist.   @prefix  skos:  <http://www.w3.org/2004/02/skos/core#>   @prefix  mo:  <http://purl.org/ontology/mo/>   @prefix  dbpedia-­‐ont:  <http://dbpedia.org/ontology/>   @prefix  schema:  <http://schema.org/>         mo:MusicGroup  skos:exactMatch  schema:MusicGroup.   mo:MusicGroup  skos:exactMatch  dbpedia-­‐ont:Band.  
  • 25. Example  of  SKOS  close  match       SKOS:  Mapping  Properties  (3)   EUCLID  -­‐  Providing  Linked  Data   25   mo:SignalGroup  skos:closeMatch  schema:MusicAlbum.   @prefix  skos:  <http://www.w3.org/2004/02/skos/core#>   @prefix  mo:  <http://purl.org/ontology/mo/>   @prefix  dbpedia-­‐ont:  <http://dbpedia.org/ontology/>   @prefix  schema:  <http://schema.org/>       mo:SignalGroup  skos:closeMatch  dbpedia-­‐ont:Album.  
  • 26. Integrity  conditions   •  Guarantee  consistency  and  avoid  contradic3ons  in   the  rela3onships  between  SKOS  concepts   SKOS:  Mapping  Properties  (4)   EUCLID  -­‐  Providing  Linked  Data   26   skos:Mapping   Relation   skos:close   Match   skos:exact   Match   skos:related   Match   Symmetric   &  Transi9ve   Disjoint   with   Par3al  Mapping  Rela3on  diagram  with  integrity  condi3ons   Symmetric  
  • 27. PUBLISHING  LINKED  DATA   EUCLID  -­‐  Providing  Linked  Data   27  
  • 28. Publishing  Linked  Data   Once  the  RDF  data  set  has  been  created  and   interlinked,  the  publishing  process  involves  the   following  tasks:   1.  Metadata  crea3on  for  describing  the  data  set     2.  Making  the  data  set  accessible   3.  Exposing  the  data  set  in  Linked  Data  repositories   4.  Valida9ng  the  data  set   EUCLID  -­‐  Providing  Linked  Data   28  
  • 29. •  Consists  of  providing  (machine-­‐readable)  metadata   of  RDF  data  sets  which  can  be  processed  by  engines   •  This  informa3on  allows  for:   –  Efficient  and  effec3ve  search  of  data  sets   –  Selec3on  of  appropriate  data  sets  (for  consump3on  or   interlinking)   –  Get  general  sta3s3cs  of  the  data  sets     EUCLID  -­‐  Providing  Linked  Data   29   Describing  RDF  Data  Sets  
  • 30. Describing  RDF  Data  Sets  (2)   •  The  common  language  for  describing  RDF  data  sets  is   VoID  (Vocabulary  of  Interlinked  Data  sets)     •  Defines  an  RDF  data  set  with  the  predicate   void:Dataset     •  Covers  4  types  of  metadata:   EUCLID  -­‐  Providing  Linked  Data   30   •  General  metadata   •  Structural  metadata   •  Descrip3ons  of  linksets   •  Access  metadata  
  • 31. VoID:  General  Metadata   •  General  metadata  is  used  by  users  to  iden3fy   appropriate  data  sets.   •  Specifies  informa3on  about  descrip3on  of  the  data   set,  contact  person/organiza3on,  the  license  of  the   data  set,  data  subject  and  some  technical  features.   •  VoID  (re)uses  predicates  from  the  Dublin  Core   Metadata1  and  FOAF2  vocabularies.   EUCLID  -­‐  Providing  Linked  Data   31   1  hjp://dublincore.org/documents/2010/10/11/dcmi-­‐terms/   2  hjp://xmlns.com/foaf/spec/  
  • 32. VoID:  General  Metadata  (2)   Predicate   Range   Descrip9on   dcterms:title   Literal   Name  of  the  data  set.   dcterms:description   Literal   Descrip3on  of  the  data  set.   dcterms:source   RDF  resource   Source  from  which  the  data  set  was  derived.   dcterms:creator   RDF  resource   Primarily  responsible  of  crea3ng  the  data  set.   dcterms:date   xsd:date   Time  associated  with  an  event  in  the  life-­‐cycle  of  the  resource.   dcterms:created   xsd:date   Date  of  crea3on  of  the  data  set.   dcterms:issued   xsd:date   Date  of  publica3on  of  the  data  set.   dcterms:modified   xsd:date   Date  on  which  the  data  set  was  changed.   foaf:homepage   Literal   Name  of  the  data  set.   dcterms:publisher   RDF  resource   En3ty  responsible  for  making  the  data  set  available.   dcterms:contributor   RDF  resource   En3ty  responsible  for  making  contribu3ons  to  the  data  set.   EUCLID  -­‐  Providing  Linked  Data   32   Source:    hjp://www.w3.org/TR/void/#metadata    General  Information   Contains  informa3on  about  the  crea3on  of  the  data  set    
  • 33. VoID:  General  Metadata  (3)   Other  Information   •  License  of  the  data  set:  specifies  the  usage  condi3ons  of   the  data.  The  license  can  be  pointed  with  the  property   dcterms:license   •  Category  of  the  data  set:  to  specify  the  topics  or  domains   covered  by  the  data  set,  the  property  dcterms:subject   can  be  used   •  Technical  features:  the  property  void:feature  can  be   used  to  express  technical  proper3es  of  the  data  (e.g.  RDF   serializa3on  formats)     EUCLID  -­‐  Providing  Linked  Data   33  
  • 34. VoID:  Structural  Metadata     EUCLID  -­‐  Providing  Linked  Data   34   •  Provides  high-­‐level  informa3on  about  the  internal   structure  of  the  data  set   •  This  metadata  is  useful  when  exploring  or  querying   the  data  set   •  Includes  informa3on  about  resources,  vocabularies   used  in  the  data  set,  sta3s3cs  and  examples  of   resources  in  the  data  set  
  • 35. VoID:  Structural  Metadata  (2)     EUCLID  -­‐  Providing  Linked  Data   35   Information  about  resources   •  Example  resources:  allow  users  to  get  an  impression  of  the   kind  of  resources  included  in  the  data  set.  Examples  can  be   shown  with  the  property  void:exampleResource   •  Pajern  for  resource  URIs:  the  void:uriSpace  property   can  be  used  to  state  that  all  the  en3ty  URIs  in  a  data  set  start   with  a  given  string     :MusicBrainz  a  void:Dataset;        void:exampleResource            <http://musicbrainz.org/artist/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d>  .   :MusicBrainz  a  void:Dataset;        void:uriSpace  "http://musicbrainz.org/"  .  
  • 36. VoID:  Structural  Metadata  (3)     EUCLID  -­‐  Providing  Linked  Data   36   Vocabularies  used  in  the  data  set   •  The  void:vocabulary  property  iden3fies  the  vocabulary  or   ontology  that  is  used  in  a  data  set   •  Typically,  only  the  most  relevant  vocabularies  are  listed   •  This  property  can  only  be  used  for  en3re  vocabularies.  It   cannot  be  used  to  express  that  a  subset  of  the  vocabulary   occurs  in  the  data  set.     :MusicBrainz  a  void:Dataset;        void:vocabulary  <http://purl.org/ontology/mo/>  .  
  • 37. VoID:  Structural  Metadata  (4)   EUCLID  -­‐  Providing  Linked  Data   37   Source:    hjp://www.w3.org/TR/void/#metadata     Statistics  about  a  data  set   Express  numeric  sta3s3cs  about  a  data  set:         Predicate   Range   Descrip9on   void:triples   Number   Total  number  of  triples  contained  in  the  data  set.     void:entities   Number   Total  number  of  en33es  that  are  described  in  the  data  set.   An  en3ty  must  have  a  URI,  and  match  the  void:uriRegexPajern     void:classes   Number   Total  number  of  dis3nct  classes  in  the  data  set.   void:properties   Number   Total  number  of  dis3nct  proper3es  in  the  data  set.   void:distinctSubjects   Number   Total  number  of  dis3nct  subjects  in  the  data  set.   void:distinctObjects   Number   Total  number  of  dis3nct  objects  in  the  data  set.   void:documents   Number   Total  number  of  documents,  in  case  that  the  data  set  is   published  as  a  set  of  individual  documents.  
  • 38. VoID:  Structural  Metadata  (5)     EUCLID  -­‐  Providing  Linked  Data   38   Partitioned  data  sets   •  The  void:subset  property  provides  descrip3on  of  parts  of  a   data  set     •  Data  sets  can  be  par33oned  based  on  classes  or  proper9es:   •  void:classPartition  contains  only  instances  of  a  par3cular  class   •  void:propertyPartition  contains  only  triples  with  a  par3cular  predicate   :MusicBrainz  a  void:Dataset;      void:subset  :MusicBrainzArtists  .   :MusicBrainz  a  void:Dataset;      void:classPartition  [  void:class  mo:Release  .]  ;      void:propertyParition  [  void:property  mo:member  .]  .  
  • 39. VoID:  Describing  Linksets       EUCLID  -­‐  Providing  Linked  Data   39   •  Linkset:  collec3on  of  RDF  links  between  two   RDF  data  sets   :DS1   :DS2   :LS1   :LS2   Image  based  on  hjp://seman3cweb.org/wiki/File:Void-­‐linkset-­‐conceptual.png   owl:sameAs                           @PREFIX  void:<http://rdfs.org/ns/void#>     @PREFIX  owl:<http://www.w3.org/2002/07/owl#>     :DS1  a  void:Dataset  .   :DS2  a  void:Dataset  .   :DS1  void:subset  :LS1  .   :LS1  a  void:Linkset;            void:linkPredicate                  owl:sameAs;              void:target  :DS1,  :DS2  .  
  • 40. VoID:  Describing  Linksets  (2)       EUCLID  -­‐  Providing  Linked  Data   40   Example                           @PREFIX  void:<http://rdfs.org/ns/void#>     @PREFIX  skos:<http://www.w3.org/2002/07/owl#>     :MusicBrainz  a  void:Dataset  .   :DBpedia  a  void:Dataset  .     :MusicBrainz  void:classPartition  :MBArtists  .   :MBArtists  void:class  mo:MusicArtist  .     :MBArtists  a  void:Linkset;            void:linkPredicate                  skos:exactMatch;              void:target  :MusicBrainz,  :DBpedia  .  
  • 41. The  access  metadata  describes  the  methods  of   accessing  the  actual  RDF  data  set                     *  This  assumes  that  the  default  graph  of  the  SPARQL  endpoint  contains  the  data  set.   VoID  cannot  express  that  a  data  set  is  contained  a  specific  named  graph.  This  can  be   specified  with  SPARQL  1.1.  Service  Descrip3on       VoID:  Access  Metadata     EUCLID  -­‐  Providing  Linked  Data   41   Method   Predicate     Descrip9on   URI  look  up  endpoint   void:uriLookupEndpoint   Specifies  the  URI  of  a  service  for  accessing  the  data   set  (different  from  the  SPARQL  protocol)   Root  resource   void:rootResource   URI  of  the  top  concepts  (only  for  data  sets   structured  as  trees)   SPARQL  endpoint   void:sparqlEndpoint   Provides  access  to  the  data  set  via  the  SPARQL   protocol.*     RDF  data  dumps   void:dataDump   Specifies  the  loca3on  of  the  dump  file.  If  the  data   set  is  split  into  mul3ple  files,  then  several  values  of   this  property  are  provided.     CH  5  
  • 42. Providing  Access  to  the         Data  Set     The  data  set  can  be  accessed  via  different   mechanisms:         EUCLID  -­‐  Providing  Linked  Data   42   RDFa   RDF   dump   SPARQL   endpoint   Dereferencing   HTTP  URIs  
  • 43. Dereferencing  HTTP  URIs   •  Allows  for  easily  exploring  certain  resources   contained  in  the  data  set   •   What  to  return  for  a  URI?   •  Immediate  descrip9on:  triples  where  the  URI  is  the  subject.   •  Backlinks:  triples  where  the  URI  is  the  object.   •  Related  descrip9ons:  informa3on  of  interest  in  typical  usage  scenarios.   •  Metadata:  informa3on  as  author  and  licensing  informa3on.   •  Syntax:  RDF  descrip3ons  as  RDF/XML  and  human-­‐readable  formats.   •  Applica3ons  (e.g.  LD  browsers)  render  the  retrieved   informa3on  so  it  can  be  perceived  by  a  user.   EUCLID  -­‐  Providing  Linked  Data   43   Source:    How  to  Publish  Linked  Data  on  The  Web  -­‐  Chris  Bizer,  Richard  Cyganiak,  Tom  Heath.     CH  1  
  • 44. Dereferencing  HTTP  URIs  (2)   Example:  Dereferencing   EUCLID  -­‐  Providing  Linked  Data   44  
  • 45. RDFa   •  RDFa  =  “RDF  in  ajributes”   •  Extension  to  HTML5  for  embedding  RDF  within   HTML  pages:   –  The  HTML  is  processed  by  the  browser,  the  (human)   consumer  don’t  see  the  RDF  data     –  The  RDF  triples  within  the  page  are  consumed  by  APIs  to   extract  the  (semi-­‐)structured  data     •  It  is  considered  as  the  bridge  between  the  Web  of   Data  and  the  Web  of  Documents   •  It  is  a  complete  serializa9on  of  RDF   EUCLID  -­‐  Providing  Linked  Data     45  
  • 46. RDFa:  Attributes   A]ribute  role   A]ribute   Descrip9on   Syntax   prefix   List  of  prefix-­‐name  IRIs  pairs   vocab   IRI  that  specifies  the  vocabulary  where  the  concept  is  defined   Subject   about   Specifies  the  subject  of  the  rela3onship   Predicate   property   Express  the  rela3onship  between  the  subject  and  the  value   rel   Defines  a  rela3on  between  the  subject  and  a  URL     rev   Express  reverse  rela3onships  between  two  resources   Resource   href   Specifies  an  object  URI  for  the  rel  and  rev  ajributes   resource   Same  as  href  (used  when  href  is  not  present)   src   Specifies  the  subject  of  a  rela3onship   Literal   datatype   Express  the  datatype  of  the  object  of  the  property  ajribute   content   Supply  machine-­‐readable  content  for  a  literal   xml:lang,  lang   Specifies  the  language  of  the  literal   Macro   typeof   Indicate  the  RDF  type(s)  to  associate  with  a  subject   inlist   An  object  is  added  to  the  list  of  a  predicate.     EUCLID  -­‐  Providing  Linked  Data     46  
  • 47. RDFa:  Example     Extracting  RDF  from  HTML   EUCLID  -­‐  Providing  Linked  Data     47   <div  class="ar3stheader"        about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"          typeof="hjp://purl.org./ontology/mo/MusicGroup">        …   </div>   <hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>           HTML  (+RDFa):   RDF:  
  • 48. RDFa:  Example     Extracting  RDF  from  HTML   EUCLID  -­‐  Providing  Linked  Data     48   <div  class="ar3stheader"        about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"          typeof="hjp://purl.org./ontology/mo/MusicGroup">        …   </div>   <hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>      <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>         HTML  (+RDFa):   RDF:  
  • 49. RDFa:  Example     Extracting  RDF  from  HTML   EUCLID  -­‐  Providing  Linked  Data     49   <div  class="ar3stheader"        about="hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_"          typeof="hjp://purl.org./ontology/mo/MusicGroup">        …   </div>   <hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d#_>      <hjp://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type>          <hjp://purl.org./ontology/mo/MusicGroup>.       HTML  (+RDFa):   RDF:  
  • 50. RDFa:  Example  (2)   Extracting  RDF  from  MusicBrainz.org   EUCLID  -­‐  Providing  Linked  Data     50   hjp://musicbrainz.org/ar3st/b10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d  
  • 51. RDFa:  Example  (2)   Extracting  RDF  from  MusicBrainz.org   EUCLID  -­‐  Providing  Linked  Data     51   Source:  hjp://www.w3.org/2007/08/pyRdfa/  
  • 52. RDFa:  Example  (2)   Extracting  RDF  from  MusicBrainz.org   EUCLID  -­‐  Providing  Linked  Data     52   hjp://www.w3.org/2007/08/pyRdfa/extract?uri=hjp%3A%2F%2Fmusicbrainz.org %2Far3st%2Fb10bbbfc-­‐cf9e-­‐42e0-­‐be17-­‐e2c3e1d2600d&format=nt   Watch  the  EUCLID  screencast:  http://vimeo.com/euclidproject  
  • 53. RDF  Dump   •  An  RDF  dump  refers  to  a  file  which  contains  (part  of)   a  data  set  specified  in  an  RDF  format  (RDF/XML,  N-­‐ Triples,  N-­‐Quads)   •  The  data  set  can  be  split  into  several  RDF  dumps     •  A  list  of  available  data  sets  available  as  RDF  dumps   can  be  found  at:   –  hjp://www.w3.org/wiki/DataSetRDFDumps     EUCLID  -­‐  Providing  Linked  Data   53  
  • 54. SPARQL  Endpoint   •  The  SPARQL  endpoint  refers  to  the  URI  of  the   listener  of  the  SPARQL  protocol  service,  which   handles  requests  for  SPARQL  protocol  opera3ons     •  The  user  submits  SPARQL  queries  to  the  SPARQL   endpoint  in  order  to  retrieve  only  a  desired  subset  of   the  RDF  data  set     •  List  of  available  SPARQL  endpoints:   •  hjp://www.w3.org/wiki/SparqlEndpoints   •  hjp://labs.mondeca.com/sparqlEndpointsStatus/     EUCLID  -­‐  Providing  Linked  Data   54   CH  2  
  • 55. Using  Linked  Data  Catalogs   •  Data  catalogs,  markets  or  repositories  are  pla{orms   dedicated  to  provide  access  to  a  wide  range  of  data   sets  from  different  domains     •  Allow  data  consumers  to  easily  find  and  use  the  data   •  Usually  the  catalogs  offer  relevant  metadata  about   the  crea3on  of  the  data  set   EUCLID  -­‐  Providing  Linked  Data   55  
  • 56. Using  Linked  Data  Catalogs  (2)   How  to  publish  an  RDF  data  set  into  a  catalog?   EUCLID  -­‐  Providing  Linked  Data   56   Create  your  own  data   catalog   Recommended  for  big   organiza3ons/ins3tu3ons   aiming  at  providing  a  large   number  of  data  sets   Use  a  data  management   system,  for  example:   Upload  your  data  set   into  an  exis3ng  catalog   Allows  data  consumers  to   easily  find  new  data  sets   Common  LD  catalogs  are:    -­‐    -­‐  The  Linking  Open  Data  Cloud  
  • 57. Validating  Data  Sets   There  are  different  ways  to  validate  the  published  RDF   data  set:   EUCLID  -­‐  Providing  Linked  Data   57   General   validators   Parsing  &   Syntax   •  Vapour  -­‐  Performs  two  types  of  tests:  without  content   nego3a3on  and  reques3ng  RDF/XML  content                  hjp://validator.linkeddata.org/vapour   •  URI  Debugger  -­‐  Retreieves  the  HTTP  responses  of  accessing  a  URI                                                        hjp://linkeddata.informa3k.hu-­‐berlin.de/uridbg/     •  RDF  Triple-­‐Checker  –  Dereferences  namespaces  associated  with   the  resources  used  in  the  document                                                                                                                                                                hjp://graphite.ecs.soton.ac.uk/checker/   •  W3C  RDF/XML  Valida9on  Service  –  Evaluates  the  syntax  of  RDF/ XML  documents  and  displays  the  RDF  triples  in  it                                                                       hjp://validator.linkeddata.org/vapour       •  W3C  Markup  Valida9on  Service  –  Checks  syntac3c  correctness   for  web  documents  with  RDFa  markup                                                                                                 hjp://validator.w3.org/   •  RDF:ALERTS  –  Validates  syntax,  undefined  resources,  datatype   and  other  types  of  errors                                                                                                                                                                                   hjp://swse.deri.org/RDFAlerts/   Accessibility  
  • 58. Validating  Data  Sets  (2)   Example:  Validating  URIs  with  Vapour   EUCLID  -­‐  Providing  Linked  Data   58   Source:  hjp://idi.fundacionc3c.org/vapour    
  • 59. Validating  Data  Sets  (3)   Example:  Validating  URIs  with  Vapour   EUCLID  -­‐  Providing  Linked  Data   59   Source:  hjp://idi.fundacionc3c.org/vapour    
  • 60. Validating  Data  Sets  (4)   Example:  Validating  URIs  with   Vapour   EUCLID  -­‐  Providing  Linked  Data   60   Source:  hjp://idi.fundacionc3c.org/vapour     Example:  Validating  URIs  with  Vapour   hjp://dbpedia.org/page/The_Beatles   hjp://dbpedia.org/data/The_Beatles.xml   HTML  content   RDF  document  
  • 61. PROVIDING  LINKED  DATA:   CHECKLIST   EUCLID  -­‐  Providing  Linked  Data   61  
  • 62. Providing  Linked  Data:   Checklist  (1)   Creating  Linked  Data   o All  the  relevant  en33es/concepts  were   effec3vely  extracted  from  the  raw  data  ?   o Are  all  the  created  URIs  dereferenceable?   o Are  you  reusing  terms  from  widely  accepted     vocabularies?   EUCLID  -­‐  Providing  Linked  Data   62  
  • 63. Providing  Linked  Data:   Checklist  (2)   Interlinking  Linked  Data   o Is  the  data  set  linked  to  other  RDF  data  sets?   o Are  the  created  vocabulary  terms  linked  to   other  vocabularies?   EUCLID  -­‐  Providing  Linked  Data   63  
  • 64. Providing  Linked  Data:   Checklist  (3)   Publishing  Linked  Data   o Do  you  provide  data  set  metadata?   o Do  you  provide  informa3on  about  licensing?   o Do  you  provide  addi3onal  access  methods?   o Is  the  data  set  available  in  LD  catalogs?   o Did  the  data  set  pass  the  valida3on  tests?   EUCLID  -­‐  Providing  Linked  Data   64  
  • 65. Summary   EUCLID  -­‐  Providing  Linked  Data   65   •  The  Linked  Data  lifecycle:   •  3  core  tasks:  crea3ng,  interlinking  and  publishing   •  Crea3on  of  Linked  Data:   •  Extrac3ng  relevant  data,  using  URIs  to  name  en33es  and  selec3ng   vocabularies  and  expressing  the  data  using  the  RDF  data  model   •  Interlinking  Linked  Data:   •  Challenges  of  link  discovery,  using  Silk  to  create  links  between  two   data  sets  and  using  SKOS  links     •  Publishing  Linked  Data:   •  Crea3on  of  data  set  metadata;  publishing  the  data  set  via  RDF   dumps,  SPARQL  endpoints  or  RDFa;  using  RDFa  and  schema.org  to   enrich  search  results,  and  uploading  the  data  set  to  a  LD  catalog   In  this  chapter  we  studied:  
  • 66. The  Web  &  Linked  Data   •  Linked  Data  catalogs   •  Applica9ons  
  • 67. CKAN   •  CKAN  is  an  open  source  pla{orm  for  developing  data   set  catalogs   •  Implement  useful  tools  for  data  publishers  to   support:   •  Data  harves3ng   •  Crea3on  of  metadata   •  Access  mechanisms  to  the  data  set   •  Upda3ng  the  data  set   •  Monitoring  the  access  to  the  data  set   EUCLID  -­‐  Providing  Linked  Data   67  
  • 68. CKAN  (2)   EUCLID  -­‐  Providing  Linked  Data   68   Source:  hjp://ckan.org  
  • 69. CKAN  (3)   EUCLID  -­‐  Providing  Linked  Data   69   Source:  hjp://ckan.org  
  • 70. •  The  Data  Hub  is  a  community-­‐run  data  catalog  which   contains  more  than  5,000  data  sets1   •  “(…)  is  an  openly  editable  open  data  catalogue,  in  the   style  of  Wikipedia”.2     •  It  is  implemented  on  top  of  the  CKAN  pla{orm     •  Allows  the  crea3on  of  groups:   –  The  Linking  Open  Data  Cloud  group  exclusively  contains   Linked  Data  sets     EUCLID  -­‐  Providing  Linked  Data   70   1  According  to  the  informa3on  presented  in  the  portal  on  March  2013   2  Source:  hjp://datahub.io/about     The  Data  Hub    
  • 71. EUCLID  -­‐  Providing  Linked  Data   71   Source:  hjp://datahub.io/   The  Data  Hub  (2)  
  • 72. The  Data  Hub  (3)   EUCLID  -­‐  Providing  Linked  Data   72   Source:  hjp://datahub.io/  
  • 73. The  Linking  Open  Data  Cloud   EUCLID  -­‐  Providing  Linked  Data   73   September  2011   Source:  Linking  Open  Data  cloud  diagram,  by  Richard  Cyganiak  and  Anja  Jentzsch  
  • 74. The  Linking  Open  Data  Cloud   How  to  publish  an  RDF  data  set  in  this  cloud?   1.  The  data  set  must  follow  the  Linked  Data  principles   2.  The  data  set  must  contain  at  least  1,000  RDF  triples   3.  The  data  set  must  contain  at  least  50  RDF  links  to  a   data  set  that  is  already  in  the  diagram   4.  Access  to  the  data  set  must  be  provided   Once  these  criteria  are  met,  the  data  publisher  must  add   the  data  set  to  the  Data  Hub  catalog,  and  contact  the   administrators  of  the  Linking  Open  Data  Cloud  group   EUCLID  -­‐  Providing  Linked  Data   74   Source:  hjp://lod-­‐cloud.net/  
  • 75. Linked  Data  &  Search  Engines   EUCLID  -­‐  Providing  Linked  Data   75   •  Search  engines  collect  informa3on  about  web   resources  in  order  to  produce  richer  search  results   by  improving  the  display  of  the  results   •  This  is  only  possible  if  the  search  engines  are  able  to   understand  the  content  within  the  web  pages     •  The  HTML  pages  must  be  annotated  with  machine-­‐ readable  content  to  describe  their  content:   Mark  up  format   Vocabulary  
  • 76. RDFa  for  marking  up  data   EUCLID  -­‐  Providing  Linked  Data   76   •  RDFa  is  used  to  provide  (semi-­‐)structured  Linked   Data  embedded  in  web  content   •  Examples:   – Some  search  engines  use  RDFa,  e.g.,  Google,   Yahoo!  and  Bing   – Facebook’s  Open  Graph  is  based  on  RDFa  
  • 77. Google  Rich  Snippets   EUCLID  -­‐  Providing  Linked  Data   77   •  Embedding  seman3cs  via  RDFa  (or  microformats/ microdata)  enhances  search  results:  
  • 78. Google  Rich  Snippets  (2)   EUCLID  -­‐  Providing  Linked  Data   78  
  • 79. Schema.org   EUCLID  -­‐  Providing  Linked  Data   79   •  Collec3on  of  schemas/vocabularies  to  markup  the   HTML  pages   •  It  is  recognized  by  Bing,  Google,  Yahoo!  and  Yandex   •  Covers  a  wide  range  of  knowledge  domains     •  It  also  offers  an  extension  mechanism  in  case  the   publisher  is  interested  in  adding  new  concepts  to  the   vocabularies  
  • 80. Schema.org  (2)   EUCLID  -­‐  Providing  Linked  Data   80   The  vocabularies  cover  the  following  topics:   Source:  hjp://schema.org/docs/schemas.html   “The  world  is  too  rich,   complex  and  interes.ng  for   a  single  schema  to  describe   fully  on  its  own.  With   schema.org  we  aim  to  find  a   balance,  by  providing  a  core   schema  that  covers  lots  of   situa.ons,  alongside   extension  mechanisms  for   extra  detail.”   (Dan  Brickley,  schema.org)  
  • 81. EUCLID  -­‐  Providing  Linked  Data   81   Integrates(/aligns)  exis3ng  vocabularies  where   appropriate,  e.g.  rNews   Source:  hjp://schema.org/Ar3cle   Schema.org  (3)  
  • 82. Google  Knowledge  Graph   EUCLID  -­‐  Providing  Linked  Data   82   •  The  user  is  able  to  find   answer  to  their  queries   without  browsing  pages   •  Provides  detailed   informa3on    
  • 83. Google  Knowledge  Graph  (2)   EUCLID  -­‐  Providing  Linked  Data   83   •  Google  Search  results    include  structured  data   from  Freebase     •  Might  disambiguate   search  terms  
  • 84.              Freebase   EUCLID  -­‐  Providing  Linked  Data   84                     •  Knowledge  base  of   structured  data   •  Data  is  stored  as  a   graph     •  Describes  data  from   different  domains  
  • 85. Bing  Snapshot   EUCLID  -­‐  Providing  Linked  Data   85   •  Provides  structured  data  related  to  the  search  term   •  Includes  a  significant  number  of  en33es  from  more   domains   •  Connects  data  from  LinkedIn   •  Is  is  powered  by  the  graph  engine  Trinity.RDF  
  • 86. Bing  Snapshot  (2)   EUCLID  -­‐  Providing  Linked  Data   86  
  • 87.              Open  Graph  Protocol   EUCLID  -­‐  Providing  Linked  Data   87   •  It  was  originally  created  by  Facebook   •  Allows  describing  web  content  as  graph  objects,   establishing  connec3ons  between  people  and   objects   •  The  descrip3ons  are  embedded  in  the  web  page  as   RDFa  data   •  Supports  descrip3on  of  several  domains:  basic   metadata,  music,  video,  ar3cles,  books,  websites  and   user  profiles   Source:  hjp://ogp.me/  
  • 88.              Open  Graph  Protocol  (2)   EUCLID  -­‐  Providing  Linked  Data   88   Source:  hjp://ogp.me/   Who  is  using  Open  Graph  protocol?   Source:  hjp://ogp.me/   Facebook   Google   Mixi   Consumers   Publishers   IMDb   Microso•   NHL   Posterous   Rojen  Tomatoes   TIME  
  • 89.              Open  Graph  Protocol  (3)   EUCLID  -­‐  Providing  Linked  Data   89   •  Facebook  expands  vocabulary  of  rela3onships  beyond   “friendship”  and  “like”                    more  ac9ons!     Source:  hjps://developers.facebook.com/docs/opengraph/  
  • 90.              Open  Graph  Protocol  &                                                                                      Facebook   EUCLID  -­‐  Providing  Linked  Data   90   List  of  domains  and  ac9ons   Source:  hjps://developers.facebook.com/docs/opengraph/   •  Listen   •  Create  a  playlist   •  Watch   •  Rate   •  Wants  to  watch   •  Rate   •  Read   •  Quote   •  Wants  to  read   •  Achieve   •  High  score   •  Bike   •  Run   •  Walk   •  Like   •  Recommend   •  Follow   General   Music   Movies     &  TV   Games   Fitness   Book   How  can  we  exploit  these  links  and  rela3onships?  
  • 91.            Facebook  Graph  Search   EUCLID  -­‐  Providing  Linked  Data   91   •  focuse  on  people  and  their  interests,  exploi3ng   how  everything  is  related  to  each  other   •  Queries  are  specified  using  natural  language   •  Takes  advantage  of  context  and  suggest  possible   queries     •  Allows  for  building  more  complex  (expressive)   queries  that  are  not  possible  with  normal  search:   –  For  example,  “music  liked  by  me  and  friends  who  live  in   my  city”  
  • 92.            Facebook  Graph  Search  (2)   EUCLID  -­‐  Providing  Linked  Data   92       Context  (informa3on  from  profile):   Graph  search  sugges9ons:  
  • 93.            Facebook  Graph  Search  (3)   EUCLID  -­‐  Providing  Linked  Data   93   Results  
  • 94.            Facebook  Graph  Search  (4)   EUCLID  -­‐  Providing  Linked  Data   94   Observations     •  Allows  for  conjunc3ve  queries  (applying  filter  over  intermediate   results  =  “apply  operator”)   •  Disjunc9ve  queries  are  not  supported:   –  For  example:  “My  friends  who  like  Seman3cWeb.com  OR  ReadWrite”     •  Post  search  is  not  supported   –  It  is  not  possible  to  search  in  post  content  submijed  to  the  3meline   •  User  privacy  segngs  affect  the  results  
  • 95. Tools  for  providing  Linked  Data   •  Extrac9ng  data  from  spreadsheets:  OpenRefine   •  Extrac9ng  data  from  RDBMS:  R2RML   •  Extrac9ng  data  from  text:  Zemanta,  OpenCalais,  GATE   •  Interlinking  data  sets:  Silk  
  • 96. EXTRACTING  DATA  FROM   SPREADSHEETS  WITH  OPENREFINE   EUCLID  -­‐  Providing  Linked  Data   96  
  • 97. Integrate  Chart  Data   •  Task:  Integrate  latest  chart   informa3on  into  your  RDF   database.   •  Data  may  be  available  in  non-­‐ RDF  formats:   –  Plain  text   –  CSV,  TSV,  separator-­‐based   files   –  HTML  tables   –  Spreadsheets   (OpenDocument,  Excel,  …)   –  XML   –  JSON   –  …   97   LD  Data  set  Access   Integrated   Data  Set   Interlinking   Cleansing   Vocabulary   Mapping   SPARQL   Endpoint   Publishing   CSV/   TSV   HTML   Spreadsheets   JSON   Data  acquisi3on   EUCLID  -­‐  Providing  Linked  Data  
  • 98. Example  Data   The Beatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million Queen, 90.5 million     98   hjp://en.wikipedia.org/wiki/   List_of_best-­‐selling_music_ar3sts   Ar3st   Country  of   origin   Period   ac3ve   Release-­‐year   of  first   charted   record   Total  cer3fied  units   (from  available  markets)[Notes]   The  Beatles   United   Kingdom   1960– 1970[4]   1962[4]   Total  available  cer9fied  units:     250  million[show]   Elvis  Presley   United   States   1954– 1977[28]   1954[28]   Total  available  cer9fied  units:   203.3  million[show]   Michael   Jackson[Note  2]   United   States   1964– 2009[32]   1971[32]   Total  available  cer9fied  units:   157.4  million[show]   Madonna   United   States   1979– present[44]   1982[44]   Total  available  cer9fied  units:   160.1  million[show]   Led  Zeppelin   United   Kingdom   1968– 1980[50]   1969[50]   Total  available  cer9fied  units:   135.5  million[show]   Queen   United   Kingdom   1971– present[53]   1973[53]   Total  available  cer9fied  units:     90.5  million[show]   { "artist": { "class": "artist", "name": "The Beatles" }, "rank": 1, "value": 250 million }, … CSV   JSON   HTML  tables   EUCLID  -­‐  Providing  Linked  Data  
  • 99.                                      OpenRefine     •  transforms  and  cleans  messy   input  data  sets.     •  is  an  open-­‐source  successor  of   Google  Refine.     •  allows  for  en3ty  reconcilia3on   against  SPARQL  endpoints  or   RDF  data.     •  is  extended  with  plugins  that   enhance  its  func3onality,  e.g.   for  RDF  support.   99  EUCLID  -­‐  Providing  Linked  Data   Quick  Facts  
  • 100. Use  of  OpenRefine   100   1.  Messy  input  data  is   imported,    transformed   into  a  table  represen-­‐ ta3on  and  cleaned.   3.  Define  the  structure  of   the  RDF  output.     4.  The  data  is  exported   into  some  RDF  syntax.   2.  En3ty  reconcilia3on  is   applied  to  allow  for   interlinking  with   exis3ng  data  sets.   The Beatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million Queen, 90.5 million     CSV   musicbrainz:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d :totalSales "25000000000"^^xsd:int . musicbrainz:01809552-4f87-45b0-afff-2c6f0730a3be :totalSales "2.033E10"^^xsd:int . musicbrainz:f27ec8db-af05-4f36-916e-3d57f91ecf5e :totalSales "1.574E10"^^xsd:int . musicbrainz:79239441-bfd5-4981-a70c-55c3f15c1287 :totalSales "1.601E10"^^xsd:int . musicbrainz:678d88b2-87b0-403b-b63d-5da7465aecc3 :totalSales "1.355E10"^^xsd:int . musicbrainz:0383dadf-2a4e-4d10-a46a-e9e041da8eb3 :totalSales "9.05E9"^^xsd:int . RDF   EUCLID  -­‐  Providing  Linked  Data  
  • 101. Typical  steps:   •  Group  and  explore  data   items   •  Dele3ng  columns  or  rows   based  on  filter  condi3on   •  Split  columns  into  several   columns  based  on   condi3on   •  Modify  messy  data  items   with  GREL,  a  powerful   expression  language   •  Replay  steps  from  a   previous  Refine  project   101  EUCLID  -­‐  Providing  Linked  Data   Data  Transformation  
  • 102. How  to  Generate  RDF?   •  Addi3onal  problem:  data  needs  to  be  interlinked   with  exis3ng  MusicBrainz  data   •  This  is  the  point  where  plugins  come  into  play:   –  RDF  Refine:  developed  by  DERI   –  An  extension  of  OpenRefine  to  support  RDF   102   ?   RDF   EUCLID  -­‐  Providing  Linked  Data  
  • 103. Core  Capabilities   •  Interlinking  of  data  by  en3ty  reconcilia3on   – Against  SPARQL  endpoints,  RDF  dumps   – Discovery  of  relevant  RDF  data  sets     •  RDF  export  with  the  help  of  RDF  skeletons   – Define  the  vocabulary  and  graph  structure  of  the   RDF  serializa3on   – In  Turtle,  RDF/XML   103  EUCLID  -­‐  Providing  Linked  Data  
  • 104. Typical  steps:   •  Define  a  reconcilia3on  service   •  Select  specific  types  to  reconcile  against   •  Start  reconciling  a  column  against  the   service   104  EUCLID  -­‐  Providing  Linked  Data   Entity  Reconciliation  
  • 105. Define  RDF  Skeletons   •  An  RDF  skeleton  defines  the  structure  of  the   RDF  triples  that  are  exported   EUCLID  -­‐  Providing  Linked  Data   105  
  • 106. RDF  Skeletons   03.09.13   106  106  EUCLID  -­‐  Providing  Linked  Data  
  • 107. EXTRACTING  DATA  FROM   RDBMS  WITH  R2RML   EUCLID  -­‐  Providing  Linked  Data   107  
  • 108. W3C  RDB2RDF   •  Task:  Integrate  data  from   rela3onal  DBMS  with  Linked   Data   •  Approach:  map  from   rela3onal  schema  to  seman3c   vocabulary  with  R2RML   •  Publishing:  two  alterna3ves  –   –  Translate  SPARQL  into   SQL  on  the  fly   –  Batch  transform  data  into   RDF,  index  and  provide   SPARQL  access  in  a   triplestore   108   LD  Data  set  Access   Integrated   Data  in   Triplestore   Interlinking   Cleansing   Vocabulary   Mapping   SPARQL   Endpoint   Publishing   Data  acquisi3on   EUCLID  -­‐  Providing  Linked  Data   R2RML   Engine   Rela3onal   DBMS  
  • 109. W3C  RDB2RDF   •  The  W3C  made,  last  year,  two  recommenda3ons  for   mapping  between  rela3onal  databases  and  RDF:   –  Direct  mapping  directly  exposes  data  as  RDF   •  Not  allowance  for  vocabulary    mapping   •  No  allowance  for  interlinking  (unless  URIs  used  in  rela3onal  data)   •  Not  appropriate  for  this  topic   – R2RML,  the  RDB  to  RDF  mapping  language   •  Allows  vocabulary  mapping  (subject,  predicate  and   object  maps  with  class  op3ons)   •  Allows  interlinking  –  URIs  can  be  constructed   •  Means  to  provide  MusicBrainz  RDF/SPARQL  itself   EUCLID  -­‐  Providing  Linked  Data   109   hjp://www.w3.org/2001/sw/rdb2rdf/  
  • 110. MusicBrainz  Next  Gen  Schema   EUCLID  -­‐  Providing  Linked  Data   110   •  Ar9st    As  pre-­‐NGS,  but              further  ajributes   •  Ar9st  Credit    Allows  joint  credit   •  Release  Group    Cf.  ‘album’            versus:   •  Release   •  Medium         •  Track   •  Track  List   •  Work   •  Recording   Source:  hjps://wiki.musicbrainz.org/Next_Genera3on_Schema  
  • 111. Music  Ontology   •  OWL  ontology  with  following  core  concepts   (classes)  and  rela3onships  (proper3es):   EUCLID  -­‐  Providing  Linked  Data   111   Source:  hjp://musicontology.com  
  • 112. R2RML  Class  Mapping   •  Mapping  tables  to  classes  is  ‘easy’:   lb:Artist  a  rr:TriplesMap  ;      rr:logicalTable  [rr:tableName  "artist"]  ;      rr:subjectMap            [rr:class  mo:MusicArtist  ;            rr:template                        "http://musicbrainz.org/artist/{gid}#_"]  ;      rr:predicateObjectMap            [rr:predicate  mo:musicbrainz_guid  ;            rr:objectMap  [rr:column  "gid"  ;                                          rr:datatype  xsd:string]]  .   EUCLID  -­‐  Providing  Linked  Data   112  
  • 113. R2RML  Property  Mapping   •  Mapping  columns  to  proper3es  can  be  easy:   lb:artist_name  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery            """SELECT  artist.gid,  artist_name.name                    FROM  artist                    INNER  JOIN  artist_name  ON  artist.name  =   artist_name.id"""]  ;      rr:subjectMap  lb:sm_artist  ;      rr:predicateObjectMap            [rr:predicate  foaf:name  ;            rr:objectMap  [rr:column  "name"]]  .   EUCLID  -­‐  Providing  Linked  Data   113  
  • 114. NGS  Advanced  Relations   EUCLID  -­‐  Providing  Linked  Data   114   •  Major  en33es  (Ar3st,  Release  Group,  Track,  etc.)  plus   URL  are  paired    (l_ar3st_ar3st)   •  Each  pairing    of  instances    refers  to  a  Link   •  Links  have  types      (cf.  RDF  proper3es)    and  ajributes         Source:  hjp://wiki.musicbrainz.org/Advanced_Rela3onship  
  • 115. R2RML  Advanced  Mapping   •  Mapping  advanced  rela3onships  (SQL  joins):   lb:artist_member  a  rr:TriplesMap  ;      rr:logicalTable  [rr:sqlQuery          """SELECT  a1.gid,  a2.gid  AS  band                FROM  artist  a1                    INNER  JOIN  l_artist_artist  ON  a1.id  =   l_artist_artist.entity0                      INNER  JOIN  link  ON  l_artist_artist.link  =  link.id                      INNER  JOIN  link_type  ON  link_type  =  link_type.id                      INNER  JOIN  artist  a2  on  l_artist_artist.entity1  =  a2.id                  WHERE  link_type.gid='5be4c609-­‐9afa-­‐4ea0-­‐910b-­‐12ffb71e3821'                    AND  link.ended=FALSE"""]  ;      rr:subjectMap  lb:sm_artist  ;      rr:predicateObjectMap            [rr:predicate  mo:member_of  ;            rr:objectMap  [rr:template  "http://musicbrainz.org/artist/ {band}#_"  ;                                        rr:termType  rr:IRI]]  .   EUCLID  -­‐  Providing  Linked  Data   115  
  • 116. EXTRACTING  DATA  FROM  TEXT   EUCLID  -­‐  Providing  Linked  Data   116  
  • 117. OpenCalais       •  Not  easily  customised/extended   •  Domain-­‐specific  coverage  varies   EUCLID  -­‐  Providing  Linked  Data   117   Source:  hjp://viewer.opencalais.com/  
  • 118. DBpedia  Spotlight         •  Not  easily  customised/extended   •  Is  currently  only  available  for  English   EUCLID  -­‐  Providing  Linked  Data   118   Source:  hjp://dbpedia-­‐spotlight.github.com/demo/   hjp://dbpedia.org/page/Slowcore   hjp://dbpedia.org/page/Dorothy_Parker  
  • 119. Zemanta   EUCLID  -­‐  Providing  Linked  Data   119   Source:  hjp://www.zemanta.com/demo/   •  Common  problem  with  general  purpose,  open-­‐domain        seman3c      annota3on      tools   •  Best  results      require      bespoke      customisa3on  
  • 120. •  General  Architecture  for  Text  Engineering   •  Free  open-­‐source  (LGPL)    framework  and  development  environment   •  Started  1996,  large  developer  community   •  Used  worldwide  by  many  organisa3ons  to   build  bespoke  solu3ons;  e.g.  Press  Associa3on   and  the  Na3onal  Archive   •  Informa3on  Extrac3on  in  many  languages   GATE   EUCLID  -­‐  Providing  Linked  Data   120   hjp://www.gate.ac.uk/  
  • 121. •  Increases  recall  over  DBpedia  by  deriving  new   lexicalisa3ons  for  URIs  from  link  anchor  texts,   disambigua3on  pages,  and  redirect  pages   GATE  Example  -­‐  LODIE   EUCLID  -­‐  Providing  Linked  Data   121  
  • 122. Precision  and  Recall   •  Generic  services  typically  very  low  recall   •  Combina3on  is  one  solu3on   •  Other  solu3on  is  custom  extrac3on   122   PER LOC ORG TOTAL DB  Spotlight 0.97  /  0.40 0.82  /  0.46 0.86  /  0.31 0.85  /  0.39 Zemanta 0.96  /  0.84 0.89  /  0.62 0.82  /  0.57 0.90  /  0.68 LODIE 0.81  /  0.82 0.73  /  0.76 0.56  /  0.59 0.71  /  0.74 Zemanta  ∩  LODIE 1.00  /  0.74 0.95  /  0.45 0.97  /  0.42 0.97  /  0.54 Zemanta  U  LODIE 0.94  /  0.93 0.77  /  0.76 0.72  /  0.71 0.82  /  0.81 EUCLID  -­‐  Providing  Linked  Data  
  • 123. Custom  GATE  Gazetteer   •  Retrieve  MusicBrainz    en3ty/label/class      with  SPARQL  query   123  EUCLID  -­‐  Providing  Linked  Data  
  • 124. GATECloud   •  Custom  (e.g.  based  around  custom  gazejeer)   GATE  pipelines  can  be  executed  on  the  cloud:   124  EUCLID  -­‐  Providing  Linked  Data  
  • 125. INTERLINKING  DATA  SETS  WITH   SILK   EUCLID  -­‐  Providing  Linked  Data   125  
  • 126. Interlinking  with  Silk   •  Task:  Create  links  between   the  data  set  and  external   Linked  Data  sources.   •  Approach:  Crea3on  of   specified  links  by  querying   the  target  data  sets   •  Alterna9ves:   –  Manual  crea3on  of   linkage  rules  by  the  user     –  Automa3c  learning   linkage  rules  by   submi…ng  predefined   SPARQL  queries   126   LD  Data  set  Access   Integrated   Data  Set   Interlinking   Cleansing   Vocabulary   Mapping   SPARQL   Endpoint   Publishing   CSV/   TSV   HTML   Spreadsheets   JSON   Data  acquisi3on   EUCLID  -­‐  Providing  Linked  Data  
  • 127. Link  Discovery  with  Silk   •  Open  source  tool  for  discovering  RDF  links  between   data  items  within  different  Linked  Data  sources   •  It  is  based  on  the  Silk  Link  Specifica3on  Language   (Silk-­‐LSL)  for  expressing  linkage  rules   •  It  accesses  the  target  RDF  data  sets  via  SPARQL   endpoints  to  generate  RDF  links   EUCLID  -­‐  Providing  Linked  Data   127   Source:  Robert  Isele.  “LOD2  Webinar  Series:Silk”    
  • 128. Silk  Variants   •  Silk  Single  Machine   •  Generates  RDF  links  on  a  single  machine   •  Data  sets  can  reside  either  locally  or  in  remote  machines   •  Provides  mul3threading  and  caching   •  Silk  MapReduce   •  Uses  a  cluster  composed  of  mul3ple  machines   •  Based  on  Hadoop  and  designed  to  scale  to  big  data  sets     •  Silk  Server   •  Used  within  applica3ons  that  consume  Linked  Data  from  the  Web   while  keeping  track  of  known  en33es     •  Provides  an  HTTP  API  for  matching  en33es  from  an  incoming   stream   EUCLID  -­‐  Providing  Linked  Data   128   Source:  hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/  
  • 129. Source:  Silk  workflow  is  par3ally  based  on  “LOD2  Webinar  Series:  Silk  -­‐(Simplified)   Linking  Workflow”  by  Rober  Isele.           Silk  Workflow   EUCLID  -­‐  Providing  Linked  Data   129   Select  LD   data  sets   •  Iden3fy   suitable  data   sets  in  LD   catalogs*   •  Select  the  two   data  sets  to   link   Specify  LD   data  sets   •  Specify  the   access  method   to  the  data  set   (RDF  dump,   SPARQL   endpoint)*   •  Specify  the   en3ty  types  to   be  linked   Write   linkage  rule   •  Specifies  how   to  compare   the  resources   •  Use  Silk-­‐LSL   •  The  rules  can   also  be  learnt     Generate   RDF  links   •  Output  links   can  be  stored   in  a  file  or  a   triple  store   •  Can  discover   SKOS  links   Silk  framework   *  See  sec3on  “Publishing  Linked  Data”    
  • 130. Linkage  Rule  Components   •  Linkage  rules  define  the  condi3ons  to  create  the  links   between  the  data  sets.  These  rules  are  composed  of:   EUCLID  -­‐  Providing  Linked  Data   130   Source:  hjp://wifo5-­‐03.informa3k.uni-­‐mannheim.de/bizer/silk/   RDF  Paths   •  Describe  the  elements  to  be   compared   •  Example:  ?a/rdfs:label     Transforma9ons   •  Apply  transforma3ons  to  the   result  set  of  an  RDF  path   •  Examples:  LowerCase,   Concatenate,  Replace,  …   Comparators   •  Compute  the  similarity  of  two   inputs   •  Examples:  String  similarity   metrics,  Date  similarity,  …     Aggrega9ons   •  Compute  an  aggregated  value   from  mul3ple  comparators   •  Examples:  Min,  Max,  Avg,  various   means,  Euclidian  distance  …   1   2   3   4  
  • 131. Silk  Workbench   •  Web  applica3on  built  on  top  of  Silk,  which  allows  the   crea3on  of  projects  to  manage  the  crea3on  of  links   between  RDF  data  sets   •  The  data  sets  can  be  stored  locally  or  accessed   remotely  by  specifying  the  SPARQL  endpoint   •  The  user  is  able  to  create  customized  linking  tasks:   –  The  tool  offers  a  graphical  editor  to  create  linkage  rules  by   combining  the  linkage  rules  components  via  drag  &  drop   elements   –  Includes  support  for  (automa3c)  learning  linkage  rules   EUCLID  -­‐  Providing  Linked  Data   131  
  • 132. Project  configuration             Silk  Workbench  (2)   EUCLID  -­‐  Providing  Linked  Data   132   1   2   3   4   1.  Project:  name  and  components  (data   sources,    linking  tasks  and  output  tasks)   2.  Data  sources:  specifica3on  of  the  data   sets  to  be  interlinked   3.  Linking  task:  specifica3on  of  the  linkage   rules  and  type  of  links  to  be  created   4.  Output  task:  mechanism  to  store  the   results  from  the  lnking  process   2  
  • 133. Editing  a  linking  task                 Silk  Workbench  (3)   EUCLID  -­‐  Providing  Linked  Data   133   1   4   2   3   1.  Linkage  rule  components     2.  Graphical  editor:  the  items  from  (1)  are  dragged  &   dropped  in  this  area,  and  connected  to  compose  the   linkage  rules     3.  Generate  links:  based  on  the  defined  linkage  rules  in  (2),   the  data  sets  are  accessed  to  discover  possible  links   4.  Learn:  automa3c  learning  of  linkage  rules  
  • 134. Adding  a  linkage  rule                 Silk  Workbench  (4)   EUCLID  -­‐  Providing  Linked  Data   134   The  previous  linkage  rule  states:   1.  Retrieve  the  foaf:name  values  from  MusicBrainz  and   the  rdfs:label  from  DBpedia   2.  Apply  lower  case  transforma3on  to  the  output  of  (1)   3.  Compare  the  output  from  (2)  using  the  metric   “Levenshtein  distance”.  If  this  distance  is  greater  than   0.90,  then  create  a  link.   1   2   3  
  • 135. Generate  Links             Silk  Workbench  (5)   EUCLID  -­‐  Providing  Linked  Data   135  
  • 136. Learn  Rules               Silk  Workbench  (6)   EUCLID  -­‐  Providing  Linked  Data   136  
  • 137. For  exercises,  quiz  and  further  material  visit  our  website:     EUCLID  -­‐  Providing  Linked  Data   137   @euclid_project   euclidproject   euclidproject   http://www.euclid-­‐project.eu   Other  channels:   eBook   Course