SlideShare a Scribd company logo
1 of 108
Download to read offline
1
From Hyperlinks to Semantic Web Properties 

using Open Knowledge Extraction
Valentina Presutti
Andrea Giovanni Nuzzolese, Sergio Consoli,
Diego Reforgiato Recupero,
Aldo Gangemi
STLab-ISTC, CNR, Italy
Extends “Uncovering the Semantics of Wikipedia pagelinks”
Accepted as combined conference/journal paper @ EKAW 2014
• Some background
• Motivation and goal
• Introducing Open Knowledge Extraction (OKE)
• A method for extracting semantic relations
from hyperlinks
• Legalo and Legalo-Wikipedia: implementations
• Evaluation results
• Conclusion and future work
2
Outline
3
The Web as envisioned in 1989
by Tim Berners-Lee
URIs and links
having types
relationships amongst
named objects
4
The current Web vs the Semantic Web
(Marja-Riitta Koivunen and Eric Miller 2001)
resources (web documents)
and links
Resources and links
that can have explicit types
To populate the web with machine
understandable information so that artificial
intelligences (AI) can use it as background
knowledge for assisting humans in performing a
significant number of their daily tasks
5
The Semantic Web vision
• Standardised knowledge representation
and query languages: OWL, RDF, SPARQL
• Web of Data: huge amount of distributed,
interlinked and formally represented data
(linked data) -> related to the “open
data” initiative
6
today…
7
Semantic Web standard knowledge
representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI
objects can also be literals
triples are connected when they share an entity
the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
“Paris”
dbpedia-owl:country
rdfs:label
*dbpedia: <http://dbpedia.org/>
7
Semantic Web standard knowledge
representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI
objects can also be literals
triples are connected when they share an entity
the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
“Paris”
dbpedia-owl:country
http://www.france.fr
foaf:homepage
rdfs:label
*dbpedia: <http://dbpedia.org/>
8
May 2007
8
from 2 million (2007) to 31 billion (2011) triples
• Most of the Web of Data derived from
structured data (typically databases) or semi-
structured data (e.g. Wikipedia infoboxes)
• Web content is mostly natural language text
(web sites, news, forums, reviews, etc.)
• Such content is highly valuable for the
Semantic Web (question answering, opinion
mining, knowledge summarisation, etc.)
9
Limits and motivation
To extract as much relevant knowledge as
possible from web textual content and
publish it in the form of
Semantic Web triples
10
Overall Goal
• Most attention to enrich the Web of Data
by learning standard relations: e.g.,
membership (rdf:type), class taxonomy
(rdfs:subClassOf), entity linking
(owl:sameAs)
• What about general factual relations?
• e.g. part, participation, causality,
location, friendship, etc.
11
Approaches to linked data and
ontology learning
• Distant supervision: maps textual sentences to linked data triples
• no support for n-ary relation
• bound to already defined relation types
• Open Information Extraction: creates resources of triples composed of text fragments
(subject, relational phrase, object)
• not directly reusable as linked data
Open Challenges:
-> new relation discovery (both facts and relation types)
-> methods for automatically formalising discovered knowledge
-> usable label generation for new discovered relations
12
Relation extraction
“Paris is the capital city of France” dbpedia:Paris dbpedia-owl:country dbpedia:France
“John Stigall received a Bachelor of arts” ?
“John Stigall received a Bachelor of arts” (John Stigall, received, a Bachelor of arts)
Open Knowledge Extraction:
• unsupervised -> no need of huge annotated
corpora for training
• open-domain -> not bound to specific domain
• abstractive -> model the text, use external
knowledge resources, use summarisation and
language generation techniques
• producing formalised output -> directly
reusable as linked data
13
Contribution
14
OIE vs OKE
Subject Predicate Object Approach
John Stigall received a Bachelor of arts extractive
John Stigall received from the State
University of New York
aat Cortland
extractive
dbpedia:John_Sti
gall
myprop:receives
AcademicDegree
dbpedia:Bachelor_of_
Arts
abstractive
dbpedia:John_Sti
gall
myprop:receives
AcademicDegree
From
dbpedia:State_Univers
ity_of_New_York
abstractive
“John Stigall received a Bachelor of arts from
the State University of New York at Cortland “
15
Hypothesis
Hyperlinks are pragmatic traces of semantic relations
between two entities
“McCarthy often commented on world
affairs on the Usenet forums”
Texts surrounding hyperlinks are linguistic traces of
semantic relations between two entities
Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
?
Linguistic trace: text surrounding a hyperlink
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Lisp
(DBpedia)
dbpo:wikiPageWikiLink
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo)
legalo:developProgrammingLanguage
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo)
legalo:developProgrammingLanguage
rdfs:subPropertyOf http://swrc.ontoware.org/ontology#develops
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
“McCarthy often commented on world
affairs on the Usenet forums”
“McCarthy often commented on world
affairs on the Usenet forums”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
McCarthy and Usenet are
(probably) related
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis
dul:associateWith
dbpedia:The_New_York_Times
19
What can we derive from a hyperlink?
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”“McCarthy often commented on world
affairs on the Usenet forums”
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis
dul:associateWith
dbpedia:The_New_York_Times
“McCarthy often commented on world
affairs on the Usenet forums”
Linguistic traces provides us the meaning of relevant relations
20
“Joey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.”
s
{(esubj, eobj)i}
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times)
Natural language
sentence
Set of entity
pairs
Relevant relations
evidenced by s
21
Relevant Relation Assessment
φs(esubj, eobj)i
φs(Joey Foster Ellis, The New York Times)
“Joey Foster Ellis has published on The New
York Times, and The Wall Street Journal.”
φs(The Wall Street Journal, The New York Times)
22
Usable predicate generation
Îť ~ Îťi
, Îťi
∈ Λ
Λ ≡ {λ1, ..., λn}
labels for φs(esubj, eobj)i
defined for a LOD vocabulary
publish on
φs(Joey Foster Ellis, The New York Times)
“Joey Foster Ellis has published
on The New York Times, and
The Wall Street Journal.”
(has) published on
(has) published on journal
(is) author for
writes for
23
OWL/RDF formalisation
dbpedia:Joey_Foster_Ellis
legalo:publishOn dbpedia:The_New_York_Times .
legalo:publishOn a owl:ObjectProperty ;
rdfs:range … ;
rdfs:domain … ;
… .
φs(Joey Foster Ellis, The New York Times)
“Joey Foster Ellis has published
on The New York Times, and
The Wall Street Journal.”
A novel Open Knowledge Extraction (OKE) approach that performs
unsupervised, open domain, and abstractive knowledge extraction
from text for producing directly usable machine readable data
Several implementations of OKE: Fred, Legalo, Legalo-Wikipedia
Their evaluation performed with the help of crowdsourcing and the
creation of a gold standard, all showing high values of precision,
recall, and accuracy
24
Contribution
http://wit.istc.cnr.it/stlab-tools/legalo
http://wit.istc.cnr.it/stlab-tools/legalo/wikipedia
http://wit.istc.cnr.it/kore-dev/legalo
Developing version
http://wit.istc.cnr.it/stlab-tools/fred
http://wit.istc.cnr.it/kore-dev/fred
25
Method and implementation
25
Method and implementation
Frame-based formal
representation of s
25
Method and implementation
Frame-based formal
representation of s
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing
LOD vocabularies
Relation assessment
25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing
LOD vocabularies
Relation assessment
Linguistic traces of
Wikipedia pagelinks
subjects and objects are given by
DBpedia pagelinks
26
Let’s see how it works in detail…
27
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
27
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
28
Frame-based formal representation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
29
Relation assessment
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
29
Relation assessment
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
vi ∈ V is the node in G representing the entity ei mentioned in s.
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
30
Relation assessment
P(vsubj,vobj)=[v0,edge1,…,edgen,vn], Psetsubj,obj ≡ {edge1, v1…, vn-1, edgen}
G’=(V,E’) is the undirected version of G=(V,E)
G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen}
vi ∈ V is the node in G representing the entity ei mentioned in s.
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
31
Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj )
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
31
Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj )
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
31
Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj )
φs(Rico Lebrun , Michelangelo ) ∃P(fred:Rico_Lebrun, fred:Michelangelo)
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f
AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles
AgRole ⊆ Role
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
32
Let f be a node of G such that dul:Event(f)
Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f
AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles
AgRole ⊆ Role
Sufficient condition:
φs(esubj , eobj ) ⇐
∃P (vsubj , vobj ) and ∃f ∈ Psetsubj,obj such that dul:Event(f) and ρ(f,vsubj)∈AgRole
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject
Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject
Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject Object
Relation assessment
33
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Subject Object
Relation assessment
34
Semantic Web triples and properties generation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
34
Semantic Web triples and properties generation
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
teach
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
teach art
35
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
legalo:teachArtAt
teach art at
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
teach
36
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
vn.role:Actor1 -> “with”

vn.role:Actor2 -> “with”
vn.role:Beneficiary -> “for”
vn.role:Instrument -> “with”
vn.role:Destination -> “to”
vn.role:Topic -> “about”
vn.role:Source -> “from”
Subject
Object
legalo:teachAbout
teach about
37
“Rico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.”
Semantic Web triples and properties generation
38
“Rico Lebrun taught visual arts at the
Chouinard Art Institute and at the
Disney Studios. He was influenced by
Michelangelo and maintained a lifelong
affinity for Goya and Picasso.”
Semantic Web triples and properties generation
dbpedia:Rico_Lebrun s:teachAbout dbpedia:Visual_arts .
s:teachAbout a owl:ObjectProperty ;
rdfs:subPropertyOf fred:Teach;
rdfs:domain wibi:Artist ;
rdfs:range wibi:Art ;
grounding:definedFromFormalRepresentation
fred-graph:a6705cedbf9b53d10bbcdedaa3be9791da0a9e94 ;
grounding:derivedFromLinguisticEvidence s:linguisticEvidence ;
owl:propertyChainAxiom([ owl:inverseOf s:AgentTeach ] s:TopicTeach) .
_:b2 a alignment:Cell ;
alignment:entity1 s:teachAbout ;
alignment:entity2 <http://purl.org/vocab/aiiso/schema#teaches> ;
alignment:measure "0.846"^xsd:float ;
alignment:relation "equivalence" .
domain, range, subsumption
linguistic and formal scope
alignment to existing LOD vocabularies
39
Evaluation
40
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list
corpus for relation extraction developed at Google Research
Crel−extraction*: 5 datasets each dedicated to a specific relation
{“pred”:
“/people/person/education./education/education/institution",
“sub":"/m/0g9zjv3",
“obj":"/m/0ckrjh",
“evidences”:
[{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov",
“snippet":
“Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He
started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in
many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually
creates scenic design and stage clothes for his plays.”}],
“judgments":
[{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},
{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},
{"rater":"12022408018620867151","judgment":"yes"}]}
41
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from
Crel−extraction for “attending or graduating from an institution”
Legalo produced always an output, either p or “no relation”
Ceducation: random sample of 130 snippets (one entity pair) from
Crel−extraction for “obtaining a degree of education”
Legalo produced always an output, either p or “no relation”
Cgeneral: random sample of 60 snippets from all Crel−extraction datasets
then broken into 186 single sentences with at least one entity pair.
Legalo produced 867 results, 262 p and 605 “no relation”
42
Hypothesis 1:
Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists
which holds between two entities, according to the content of s:
∃φ.φs (esubj , eobj )
φs(esubj, eobj)i
43
Evaluation results for
Relevant Relation Assessment
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
44
Hypothesis 2:
Usable predicate generation
Legalo is able to generate a usable predicate Îť
′
for a relevant relation φs between to
entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and
λi a label generated by a human for φs the following holds:
λ ~ λi , λi ∈ Λ
Îť ~ Îťi
, Îťi
∈ Λ
Λ ≡ {λ1, ..., λn}
labels for φs(esubj, eobj)i
defined for a LOD vocabulary
45
Evaluation results for
Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
• Similarity score between human created {λi} and Legalo
λ, for a φs(esubj, eobj)i
• Jaccard distance measure (string similarity)
• Semantic similarity measure based on the SimLibrary
framework*
46
Evaluation results for
Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. PirrĂł and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
47
Legalo-Wikipedia
(EKAW 2014)
3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
• Open Knowledge Extraction: unsupervised, open-
domain, abstractive, producing OWL/RDF output
• OKE for relation extraction and its
implementations
• An evaluation showing high performance for both
tasks: relevant relation assessment, usable
predicate generation
48
Conclusion
49
Where are we heading?
• Improving frequent-subgraph heuristics
• Strategies for alignment to existing LOD
vocabularies (or to ODPs?)
• Reconciliation of properties, out of sentence scope
• Using Legalo for abstractive summarisation tasks
• Using Legalo for language generation
• Identifying ways to directly compare to other
approaches
50
Thanks for your attention
http://wit.istc.cnr.it/stlab-tools/
51
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list
corpus for relation extraction developed at Google Research
Crel−extraction*: 5 datasets each dedicated to a specific relation
{“pred”:
“/people/person/education./education/education/institution",
“sub":"/m/0g9zjv3",
“obj":"/m/0ckrjh",
“evidences”:
[{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov",
“snippet":
“Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He
started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in
many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually
creates scenic design and stage clothes for his plays.”}],
“judgments":
[{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},
{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},
{"rater":"12022408018620867151","judgment":"yes"}]}
52
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from
Crel−extraction for “attending or graduating from an institution”
Legalo produced always an output, either p or “no relation”
Ceducation: random sample of 130 snippets (one entity pair) from
Crel−extraction for “obtaining a degree of education”
Legalo produced always an output, either p or “no relation”
Cgeneral: random sample of 60 snippets from all Crel−extraction datasets
then broken into 186 single sentences with at least one entity pair.
Legalo produced 867 results, 262 p and 605 “no relation”
53
Hypothesis 1:
Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists
which holds between two entities, according to the content of s:
∃φ.φs (esubj , eobj )
φs(esubj, eobj)i
54
Crowdsourcing tasks for
Relevant Relation Assessment
Task 1:
Assessing if a sentence s is an evidence of a referenced relation (i.e.
either “institution” or “education”) between two entities, mentioned in s.
Based on data from Cinstitution and Ceducation, respectively
Task 2:
Assessing if a sentence s is an evidence of any relation between two given
entities mentioned in s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be “yes” or “no”
55
Evaluation results for
Relevant Relation Assessment
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
56
Hypothesis 2:
Usable predicate generation
Legalo is able to generate a usable predicate Îť
′
for a relevant relation φs between to
entities, expressed in a sentence ′ s: given λ, a label generated by Legalo for φs, and
λi a label generated by a human for φs the following holds:
λ ~ λi , λi ∈ Λ
Îť ~ Îťi
, Îťi
∈ Λ
Λ ≡ {λ1, ..., λn}
labels for φs(esubj, eobj)i
defined for a LOD vocabulary
57
Crowdsourcing tasks for
Usable predicate generation
Task 3:
Judging if Îť generated by a machine expresses is a good summarisation of a
specific relation (i.e. either “institution” or “education”) between two given
entities mentioned in s, according to the evidence provided by s.
Based on data from Cinstitution and Ceducation, respectively
Task 4:
judging if a predicate Îť is a good summarisation of a relation expressed by
the content of s, between two given entities mentioned in s, according to the
evidence provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be Agree, Partly Agree, Disagree
58
Evaluation results for
Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
59
Crowdsourcing tasks for
Usable predicate generation
Task 5:
Creating a phrase Îť that summarises the relation expressed by the content of
s, between two given entities mentioned in s, according to the evidence
provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.60 Answer was open
• Similarity score between human created {λi} and Legalo
λ, for a φs(esubj, eobj)i
• Jaccard distance measure (string similarity)
• Semantic similarity measure based on the SimLibrary
framework*
60
Evaluation results for
Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. PirrĂł and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
61
Legalo-Wikipedia
(EKAW 2014)
3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
62
Confidence score
Given a task unit u,
a set of possible judgements {ji},
with i = 1, ...n,
{tk} a set of trust scores
each representing a rater,
with k=1,…,m
tsum=sumk=1,…m(tk)
is the sum of trust scores of raters
giving judgement on u
trust(ji)
is the sum of tk values of raters
that choose judgement ji
confidence(ji, u) for judgement ji
on the task unit u is
confidence(“yes”, 582275117) =
= (0.95+0.98)/2.82 = 0.68
confidence(“no”, 582275117) =
= (0.89/2.82 = 0.31

More Related Content

What's hot

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for LibrariesLukas Koster
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections Mitchell Whitelaw
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...Oscar Corcho
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF AgeM. Tamer Özsu
 
Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2rusersla
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Gong Cheng
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Oscar Corcho
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked DataMarco Fossati
 
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataSemantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataCataldo Musto
 
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]Teodora Petkova
 
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Martin Kalfatovic
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDFM. Tamer Özsu
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation ArchitectureHerbert Van de Sompel
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsCataldo Musto
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchersDirk Roorda
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperabilityHerbert Van de Sompel
 

What's hot (20)

Linked Open Data for Libraries
Linked Open Data for LibrariesLinked Open Data for Libraries
Linked Open Data for Libraries
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections Generous Interfaces - rich websites for digital collections
Generous Interfaces - rich websites for digital collections
 
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
EKAW2014 Keynote: Ontology Engineering for and by the Masses: are we already ...
 
Web Data Management in the RDF Age
Web Data Management in the RDF AgeWeb Data Management in the RDF Age
Web Data Management in the RDF Age
 
Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2Los Angeles R users group - Nov 17 2010 - Part 2
Los Angeles R users group - Nov 17 2010 - Part 2
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
What you Can Make Out of Linked Data
What you Can Make Out of Linked DataWhat you Can Make Out of Linked Data
What you Can Make Out of Linked Data
 
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open DataSemantics-aware Graph-based Recommender Systems exploiting Linked Open Data
Semantics-aware Graph-based Recommender Systems exploiting Linked Open Data
 
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
Solid: An Ecology of Digital Being [@SLA Europe October 28, 2020]
 
Ir1
Ir1Ir1
Ir1
 
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
 
Web Data Management with RDF
Web Data Management with RDFWeb Data Management with RDF
Web Data Management with RDF
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N Recommendations
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 

Similar to From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction

Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebValentina Presutti
 
The Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for SemanticsThe Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for SemanticsJohn Breslin
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLShalin Hai-Jew
 
Enhancing the Web Experience
Enhancing the Web ExperienceEnhancing the Web Experience
Enhancing the Web ExperienceJohn Breslin
 
Describing Everything - Open Web standards and classication
Describing Everything - Open Web standards and classicationDescribing Everything - Open Web standards and classication
Describing Everything - Open Web standards and classicationDan Brickley
 
The Social Semantic Web: An Introduction
The Social Semantic Web: An IntroductionThe Social Semantic Web: An Introduction
The Social Semantic Web: An IntroductionJohn Breslin
 
Social Networking and Collaboration Tools for Enterprise 2.0
Social Networking and Collaboration Tools for Enterprise 2.0Social Networking and Collaboration Tools for Enterprise 2.0
Social Networking and Collaboration Tools for Enterprise 2.0John Breslin
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Olivier Grisel
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagementSTIinnsbruck
 
杭州讲座 石田英敬
杭州讲座 石田英敬杭州讲座 石田英敬
杭州讲座 石田英敬luruiyang
 
Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3Universita' di Bari
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?Anna Fensel
 
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...Jisc
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesTheContentMine
 
Elsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentElsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentAlberto Labarga
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologiesMarina Santini
 

Similar to From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction (20)

Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
The Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for SemanticsThe Future of Social Networks on the Internet: The Need for Semantics
The Future of Social Networks on the Internet: The Need for Semantics
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
 
Enhancing the Web Experience
Enhancing the Web ExperienceEnhancing the Web Experience
Enhancing the Web Experience
 
Cf intro
Cf introCf intro
Cf intro
 
Describing Everything - Open Web standards and classication
Describing Everything - Open Web standards and classicationDescribing Everything - Open Web standards and classication
Describing Everything - Open Web standards and classication
 
The Social Semantic Web: An Introduction
The Social Semantic Web: An IntroductionThe Social Semantic Web: An Introduction
The Social Semantic Web: An Introduction
 
Social Networking and Collaboration Tools for Enterprise 2.0
Social Networking and Collaboration Tools for Enterprise 2.0Social Networking and Collaboration Tools for Enterprise 2.0
Social Networking and Collaboration Tools for Enterprise 2.0
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Semantic engagement
Semantic engagementSemantic engagement
Semantic engagement
 
Summary Day 2
Summary Day 2Summary Day 2
Summary Day 2
 
杭州讲座 石田英敬
杭州讲座 石田英敬杭州讲座 石田英敬
杭州讲座 石田英敬
 
Intro to Web Science (Fall 2013)
Intro to Web Science (Fall 2013)Intro to Web Science (Fall 2013)
Intro to Web Science (Fall 2013)
 
Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3Web 20 E Oltre 1202297800291589 3
Web 20 E Oltre 1202297800291589 3
 
The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?The Semantic Web Exists. What Next?
The Semantic Web Exists. What Next?
 
AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101  AHRC CDP Digital Humanities 101
AHRC CDP Digital Humanities 101
 
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...The wider environment of open scholarship – Jisc and CNI conference 10 July ...
The wider environment of open scholarship – Jisc and CNI conference 10 July ...
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Elsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living documentElsevier Gran Challenge: The living document
Elsevier Gran Challenge: The living document
 
09 semantic web & ontologies
09 semantic web & ontologies09 semantic web & ontologies
09 semantic web & ontologies
 

Recently uploaded

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 

Recently uploaded (20)

Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 

From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction

  • 1. 1 From Hyperlinks to Semantic Web Properties 
 using Open Knowledge Extraction Valentina Presutti Andrea Giovanni Nuzzolese, Sergio Consoli, Diego Reforgiato Recupero, Aldo Gangemi STLab-ISTC, CNR, Italy Extends “Uncovering the Semantics of Wikipedia pagelinks” Accepted as combined conference/journal paper @ EKAW 2014
  • 2. • Some background • Motivation and goal • Introducing Open Knowledge Extraction (OKE) • A method for extracting semantic relations from hyperlinks • Legalo and Legalo-Wikipedia: implementations • Evaluation results • Conclusion and future work 2 Outline
  • 3. 3 The Web as envisioned in 1989 by Tim Berners-Lee URIs and links having types relationships amongst named objects
  • 4. 4 The current Web vs the Semantic Web (Marja-Riitta Koivunen and Eric Miller 2001) resources (web documents) and links Resources and links that can have explicit types
  • 5. To populate the web with machine understandable information so that artificial intelligences (AI) can use it as background knowledge for assisting humans in performing a significant number of their daily tasks 5 The Semantic Web vision
  • 6. • Standardised knowledge representation and query languages: OWL, RDF, SPARQL • Web of Data: huge amount of distributed, interlinked and formally represented data (linked data) -> related to the “open data” initiative 6 today…
  • 7. 7 Semantic Web standard knowledge representation languages The RDF data model is based on triples Each entity has a URI that identifies it in the global web space subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data dbpedia:Paris* dbpedia:France “Paris” dbpedia-owl:country rdfs:label *dbpedia: <http://dbpedia.org/>
  • 8. 7 Semantic Web standard knowledge representation languages The RDF data model is based on triples Each entity has a URI that identifies it in the global web space subjects and predicates always have a URI objects can also be literals triples are connected when they share an entity the result is an interconnected graph of triples, i.e. linked data dbpedia:Paris* dbpedia:France “Paris” dbpedia-owl:country http://www.france.fr foaf:homepage rdfs:label *dbpedia: <http://dbpedia.org/>
  • 10. 8 from 2 million (2007) to 31 billion (2011) triples
  • 11. • Most of the Web of Data derived from structured data (typically databases) or semi- structured data (e.g. Wikipedia infoboxes) • Web content is mostly natural language text (web sites, news, forums, reviews, etc.) • Such content is highly valuable for the Semantic Web (question answering, opinion mining, knowledge summarisation, etc.) 9 Limits and motivation
  • 12. To extract as much relevant knowledge as possible from web textual content and publish it in the form of Semantic Web triples 10 Overall Goal
  • 13. • Most attention to enrich the Web of Data by learning standard relations: e.g., membership (rdf:type), class taxonomy (rdfs:subClassOf), entity linking (owl:sameAs) • What about general factual relations? • e.g. part, participation, causality, location, friendship, etc. 11 Approaches to linked data and ontology learning
  • 14. • Distant supervision: maps textual sentences to linked data triples • no support for n-ary relation • bound to already defined relation types • Open Information Extraction: creates resources of triples composed of text fragments (subject, relational phrase, object) • not directly reusable as linked data Open Challenges: -> new relation discovery (both facts and relation types) -> methods for automatically formalising discovered knowledge -> usable label generation for new discovered relations 12 Relation extraction “Paris is the capital city of France” dbpedia:Paris dbpedia-owl:country dbpedia:France “John Stigall received a Bachelor of arts” ? “John Stigall received a Bachelor of arts” (John Stigall, received, a Bachelor of arts)
  • 15. Open Knowledge Extraction: • unsupervised -> no need of huge annotated corpora for training • open-domain -> not bound to specific domain • abstractive -> model the text, use external knowledge resources, use summarisation and language generation techniques • producing formalised output -> directly reusable as linked data 13 Contribution
  • 16. 14 OIE vs OKE Subject Predicate Object Approach John Stigall received a Bachelor of arts extractive John Stigall received from the State University of New York aat Cortland extractive dbpedia:John_Sti gall myprop:receives AcademicDegree dbpedia:Bachelor_of_ Arts abstractive dbpedia:John_Sti gall myprop:receives AcademicDegree From dbpedia:State_Univers ity_of_New_York abstractive “John Stigall received a Bachelor of arts from the State University of New York at Cortland “
  • 17. 15 Hypothesis Hyperlinks are pragmatic traces of semantic relations between two entities “McCarthy often commented on world affairs on the Usenet forums” Texts surrounding hyperlinks are linguistic traces of semantic relations between two entities
  • 18. Pragmatic trace: presence of a hyperlink in a web page 16 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI.
  • 19. Pragmatic trace: presence of a hyperlink in a web page 16 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. ?
  • 20. Linguistic trace: text surrounding a hyperlink 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 21. Linguistic trace: text surrounding a hyperlink dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Lisp (DBpedia) dbpo:wikiPageWikiLink 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 22. Linguistic trace: text surrounding a hyperlink dbpedia:John_McCarthy myont:developed dbpedia:Lisp (Legalo) legalo:developProgrammingLanguage 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 23. Linguistic trace: text surrounding a hyperlink dbpedia:John_McCarthy myont:developed dbpedia:Lisp (Legalo) legalo:developProgrammingLanguage rdfs:subPropertyOf http://swrc.ontoware.org/ontology#develops 17 He coined the term "artificial intelligence" (AI), developed the Lisp programming language family, significantly influenced the design of the ALGOL programming language, popularized timesharing, and was very influential in the early development of AI. John MCCarthy ?
  • 24. “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” 18 Hyperlinks Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools “McCarthy often commented on world affairs on the Usenet forums”
  • 25. “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” 18 Hyperlinks Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools “McCarthy often commented on world affairs on the Usenet forums” “McCarthy often commented on world affairs on the Usenet forums”
  • 26. “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” 18 Hyperlinks Hyperlinks can be either defined by humans or automatically created by Knowledge Extraction (KE) tools “McCarthy often commented on world affairs on the Usenet forums” “McCarthy often commented on world affairs on the Usenet forums” “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”
  • 27. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums”
  • 28. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” McCarthy and Usenet are (probably) related
  • 29. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related
  • 30. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related The following entity pairs could be related: Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal
  • 31. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related The following entity pairs could be related: Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal dbpedia:Joey_Foster_Ellis dul:associateWith dbpedia:The_New_York_Times
  • 32. 19 What can we derive from a hyperlink? “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”“McCarthy often commented on world affairs on the Usenet forums” dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Usenet McCarthy and Usenet are (probably) related The following entity pairs could be related: Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal dbpedia:Joey_Foster_Ellis dul:associateWith dbpedia:The_New_York_Times “McCarthy often commented on world affairs on the Usenet forums” Linguistic traces provides us the meaning of relevant relations
  • 33. 20 “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” s {(esubj, eobj)i} Joey Foster Ellis, The New York Times Joey Foster Ellis, The New York Times The New York Times, The Wall Street Journal φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times) Natural language sentence Set of entity pairs Relevant relations evidenced by s
  • 34. 21 Relevant Relation Assessment φs(esubj, eobj)i φs(Joey Foster Ellis, The New York Times) “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” φs(The Wall Street Journal, The New York Times)
  • 35. 22 Usable predicate generation Îť ~ Îťi , Îťi ∈ Λ Λ ≡ {Îť1, ..., Îťn} labels for φs(esubj, eobj)i defined for a LOD vocabulary publish on φs(Joey Foster Ellis, The New York Times) “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.” (has) published on (has) published on journal (is) author for writes for
  • 36. 23 OWL/RDF formalisation dbpedia:Joey_Foster_Ellis legalo:publishOn dbpedia:The_New_York_Times . legalo:publishOn a owl:ObjectProperty ; rdfs:range … ; rdfs:domain … ; … . φs(Joey Foster Ellis, The New York Times) “Joey Foster Ellis has published on The New York Times, and The Wall Street Journal.”
  • 37. A novel Open Knowledge Extraction (OKE) approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable data Several implementations of OKE: Fred, Legalo, Legalo-Wikipedia Their evaluation performed with the help of crowdsourcing and the creation of a gold standard, all showing high values of precision, recall, and accuracy 24 Contribution http://wit.istc.cnr.it/stlab-tools/legalo http://wit.istc.cnr.it/stlab-tools/legalo/wikipedia http://wit.istc.cnr.it/kore-dev/legalo Developing version http://wit.istc.cnr.it/stlab-tools/fred http://wit.istc.cnr.it/kore-dev/fred
  • 39. 25 Method and implementation Frame-based formal representation of s
  • 40. 25 Method and implementation Frame-based formal representation of s Relation assessment
  • 41. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Relation assessment
  • 42. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Formalisation Relation assessment
  • 43. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Formalisation Alignment to existing LOD vocabularies Relation assessment
  • 44. 25 Method and implementation Frame-based formal representation of s Label generation (extraction+abstraction) Formalisation Alignment to existing LOD vocabularies Relation assessment Linguistic traces of Wikipedia pagelinks subjects and objects are given by DBpedia pagelinks
  • 45. 26 Let’s see how it works in detail…
  • 46. 27 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 47. 27 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 48. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 49. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 50. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 51. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 52. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 53. 28 Frame-based formal representation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 54. 29 Relation assessment “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 55. 29 Relation assessment “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 56. 30 Relation assessment “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 57. 30 Relation assessment G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 58. 30 Relation assessment G’=(V,E’) is the undirected version of G=(V,E) G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 59. 30 Relation assessment G’=(V,E’) is the undirected version of G=(V,E) G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} vi ∈ V is the node in G representing the entity ei mentioned in s. “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 60. 30 Relation assessment P(vsubj,vobj)=[v0,edge1,…,edgen,vn], Psetsubj,obj ≡ {edge1, v1…, vn-1, edgen} G’=(V,E’) is the undirected version of G=(V,E) G=(V,E), V ≡ {v0 , ..., vn }, E ≡ {edge1, ..., edgen} vi ∈ V is the node in G representing the entity ei mentioned in s. “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 61. 31 Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj ) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 62. 31 Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj ) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 63. 31 Necessary condition: φs(esubj , eobj ) ∃P(vsubj , vobj ) φs(Rico Lebrun , Michelangelo ) ∃P(fred:Rico_Lebrun, fred:Michelangelo) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 64. 32 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 65. 32 Let f be a node of G such that dul:Event(f) “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 66. 32 Let f be a node of G such that dul:Event(f) Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 67. 32 Let f be a node of G such that dul:Event(f) Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles AgRole ⊆ Role “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 68. 32 Let f be a node of G such that dul:Event(f) Role ≡ {ρ1, ..., ρn}, the set of possible roles participating in f AgRole ≡ {ρm1 , ..., ρmm }, the set of VerbNet agentive roles AgRole ⊆ Role Sufficient condition: φs(esubj , eobj ) ⇐ ∃P (vsubj , vobj ) and ∃f ∈ Psetsubj,obj such that dul:Event(f) and ρ(f,vsubj)∈AgRole “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 69. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Relation assessment
  • 70. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 71. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 72. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 73. 33 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Subject Object Relation assessment
  • 74. 34 Semantic Web triples and properties generation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 75. 34 Semantic Web triples and properties generation “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.”
  • 76. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object
  • 77. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object teach
  • 78. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object teach art
  • 79. 35 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object legalo:teachArtAt teach art at
  • 80. 36 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object
  • 81. 36 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object teach
  • 82. 36 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation vn.role:Actor1 -> “with”
 vn.role:Actor2 -> “with” vn.role:Beneficiary -> “for” vn.role:Instrument -> “with” vn.role:Destination -> “to” vn.role:Topic -> “about” vn.role:Source -> “from” Subject Object legalo:teachAbout teach about
  • 83. 37 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation
  • 84. 38 “Rico Lebrun taught visual arts at the Chouinard Art Institute and at the Disney Studios. He was influenced by Michelangelo and maintained a lifelong affinity for Goya and Picasso.” Semantic Web triples and properties generation dbpedia:Rico_Lebrun s:teachAbout dbpedia:Visual_arts . s:teachAbout a owl:ObjectProperty ; rdfs:subPropertyOf fred:Teach; rdfs:domain wibi:Artist ; rdfs:range wibi:Art ; grounding:definedFromFormalRepresentation fred-graph:a6705cedbf9b53d10bbcdedaa3be9791da0a9e94 ; grounding:derivedFromLinguisticEvidence s:linguisticEvidence ; owl:propertyChainAxiom([ owl:inverseOf s:AgentTeach ] s:TopicTeach) . _:b2 a alignment:Cell ; alignment:entity1 s:teachAbout ; alignment:entity2 <http://purl.org/vocab/aiiso/schema#teaches> ; alignment:measure "0.846"^xsd:float ; alignment:relation "equivalence" . domain, range, subsumption linguistic and formal scope alignment to existing LOD vocabularies
  • 86. 40 Evaluation sample *https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research Crel−extraction*: 5 datasets each dedicated to a specific relation {“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"}, {"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"}, {"rater":"12022408018620867151","judgment":"yes"}]}
  • 87. 41 Evaluation sample Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation” Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation” Cgeneral: random sample of 60 snippets from all Crel−extraction datasets then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”
  • 88. 42 Hypothesis 1: Relevant relation assessment Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s: ∃φ.φs (esubj , eobj ) φs(esubj, eobj)i
  • 89. 43 Evaluation results for Relevant Relation Assessment The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 90. 44 Hypothesis 2: Usable predicate generation Legalo is able to generate a usable predicate Îť ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given Îť, a label generated by Legalo for φs, and Îťi a label generated by a human for φs the following holds: Îť ~ Îťi , Îťi ∈ Λ Îť ~ Îťi , Îťi ∈ Λ Λ ≡ {Îť1, ..., Îťn} labels for φs(esubj, eobj)i defined for a LOD vocabulary
  • 91. 45 Evaluation results for Usable Predicate Generation Agree = 1, Partly Agree = 0.5, Disagree = 0 The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 92. • Similarity score between human created {Îťi} and Legalo Îť, for a φs(esubj, eobj)i • Jaccard distance measure (string similarity) • Semantic similarity measure based on the SimLibrary framework* 46 Evaluation results for Usable Predicate Generation Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence 5 Any 0.63 0.80 0.59 * G. PirrĂł and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
  • 93. 47 Legalo-Wikipedia (EKAW 2014) 3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
  • 94. • Open Knowledge Extraction: unsupervised, open- domain, abstractive, producing OWL/RDF output • OKE for relation extraction and its implementations • An evaluation showing high performance for both tasks: relevant relation assessment, usable predicate generation 48 Conclusion
  • 95. 49 Where are we heading? • Improving frequent-subgraph heuristics • Strategies for alignment to existing LOD vocabularies (or to ODPs?) • Reconciliation of properties, out of sentence scope • Using Legalo for abstractive summarisation tasks • Using Legalo for language generation • Identifying ways to directly compare to other approaches
  • 96. 50 Thanks for your attention http://wit.istc.cnr.it/stlab-tools/
  • 97. 51 Evaluation sample *https://code.google.com/p/relation-extraction-corpus/downloads/list corpus for relation extraction developed at Google Research Crel−extraction*: 5 datasets each dedicated to a specific relation {“pred”: “/people/person/education./education/education/institution", “sub":"/m/0g9zjv3", “obj":"/m/0ckrjh", “evidences”: [{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov", “snippet": “Dmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually creates scenic design and stage clothes for his plays.”}], “judgments": [{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"}, {"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"}, {"rater":"12022408018620867151","judgment":"yes"}]}
  • 98. 52 Evaluation sample Cinstitution: random sample of 130 snippets (one entity pair) from Crel−extraction for “attending or graduating from an institution” Legalo produced always an output, either p or “no relation” Ceducation: random sample of 130 snippets (one entity pair) from Crel−extraction for “obtaining a degree of education” Legalo produced always an output, either p or “no relation” Cgeneral: random sample of 60 snippets from all Crel−extraction datasets then broken into 186 single sentences with at least one entity pair. Legalo produced 867 results, 262 p and 605 “no relation”
  • 99. 53 Hypothesis 1: Relevant relation assessment Legalo is able to assess if, given a sentence s, a relevant relation exists which holds between two entities, according to the content of s: ∃φ.φs (esubj , eobj ) φs(esubj, eobj)i
  • 100. 54 Crowdsourcing tasks for Relevant Relation Assessment Task 1: Assessing if a sentence s is an evidence of a referenced relation (i.e. either “institution” or “education”) between two entities, mentioned in s. Based on data from Cinstitution and Ceducation, respectively Task 2: Assessing if a sentence s is an evidence of any relation between two given entities mentioned in s. Based on data from Cgeneral At least 3 raters with t>0.70 Answer could be “yes” or “no”
  • 101. 55 Evaluation results for Relevant Relation Assessment The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 102. 56 Hypothesis 2: Usable predicate generation Legalo is able to generate a usable predicate Îť ′ for a relevant relation φs between to entities, expressed in a sentence ′ s: given Îť, a label generated by Legalo for φs, and Îťi a label generated by a human for φs the following holds: Îť ~ Îťi , Îťi ∈ Λ Îť ~ Îťi , Îťi ∈ Λ Λ ≡ {Îť1, ..., Îťn} labels for φs(esubj, eobj)i defined for a LOD vocabulary
  • 103. 57 Crowdsourcing tasks for Usable predicate generation Task 3: Judging if Îť generated by a machine expresses is a good summarisation of a specific relation (i.e. either “institution” or “education”) between two given entities mentioned in s, according to the evidence provided by s. Based on data from Cinstitution and Ceducation, respectively Task 4: judging if a predicate Îť is a good summarisation of a relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s. Based on data from Cgeneral At least 3 raters with t>0.70 Answer could be Agree, Partly Agree, Disagree
  • 104. 58 Evaluation results for Usable Predicate Generation Agree = 1, Partly Agree = 0.5, Disagree = 0 The confidence value expresses a weighted value for inter-rater agreement, by rater trust scores
  • 105. 59 Crowdsourcing tasks for Usable predicate generation Task 5: Creating a phrase Îť that summarises the relation expressed by the content of s, between two given entities mentioned in s, according to the evidence provided by s. Based on data from Cgeneral At least 3 raters with t>0.60 Answer was open
  • 106. • Similarity score between human created {Îťi} and Legalo Îť, for a φs(esubj, eobj)i • Jaccard distance measure (string similarity) • Semantic similarity measure based on the SimLibrary framework* 60 Evaluation results for Usable Predicate Generation Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence 5 Any 0.63 0.80 0.59 * G. PirrĂł and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
  • 107. 61 Legalo-Wikipedia (EKAW 2014) 3 raters, computer science, non-familiar with Legalo, familiar with Linked Data
  • 108. 62 Confidence score Given a task unit u, a set of possible judgements {ji}, with i = 1, ...n, {tk} a set of trust scores each representing a rater, with k=1,…,m tsum=sumk=1,…m(tk) is the sum of trust scores of raters giving judgement on u trust(ji) is the sum of tk values of raters that choose judgement ji confidence(ji, u) for judgement ji on the task unit u is confidence(“yes”, 582275117) = = (0.95+0.98)/2.82 = 0.68 confidence(“no”, 582275117) = = (0.89/2.82 = 0.31