The vision of the Semantic Web is to populate the web with machine understandable information so that artificial intelligences (AI) can use it as background knowledge for assisting humans in performing a significant number of their daily tasks. Research in this field produced a standardised knowledge representation format (namely, linked data) and huge amount of machine-readable data available on the web (namely, the web of data), mostly derived from structured data (typically databases) or semi-structured data (e.g. Wikipedia infoboxes). However, most of the web consists of natural language text containing valuable knowledge for enriching the web of data. Hence, a main challenge is to extract as much relevant knowledge as possible from this content, and publish them in the form of linked data. Open Information Extraction (OIE) has been developed recently as an approach to extract information from unstructured data, mostly of a textual nature.
However, the information extracted is typically in the form of triples of strings (subject, relational phrase, object). OIE approaches are useful but insufficient alone for populating the web with machine readable information as their results are not directly linkable to, and immediately reusable from, other linked data sources. In this seminar, after giving a brief introduction to background concepts and notions, I will describe a work that proposes a novel Open Knowledge Extraction approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable information. In particular I will discuss an approach based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of such semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as linked data and ontology axioms. Experimental evaluations conducted with the help of crowdsourcing confirm this hypothesis showing very high performances. A demo of Open Knowledge Extraction at http://wit.istc.cnr.it/stlab-tools/legalo.
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
Â
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
1. 1
From Hyperlinks to Semantic Web Properties â¨
using Open Knowledge Extraction
Valentina Presutti
Andrea Giovanni Nuzzolese, Sergio Consoli,
Diego Reforgiato Recupero,
Aldo Gangemi
STLab-ISTC, CNR, Italy
Extends âUncovering the Semantics of Wikipedia pagelinksâ
Accepted as combined conference/journal paper @ EKAW 2014
2. ⢠Some background
⢠Motivation and goal
⢠Introducing Open Knowledge Extraction (OKE)
⢠A method for extracting semantic relations
from hyperlinks
⢠Legalo and Legalo-Wikipedia: implementations
⢠Evaluation results
⢠Conclusion and future work
2
Outline
3. 3
The Web as envisioned in 1989
by Tim Berners-Lee
URIs and links
having types
relationships amongst
named objects
4. 4
The current Web vs the Semantic Web
(Marja-Riitta Koivunen and Eric Miller 2001)
resources (web documents)
and links
Resources and links
that can have explicit types
5. To populate the web with machine
understandable information so that artificial
intelligences (AI) can use it as background
knowledge for assisting humans in performing a
significant number of their daily tasks
5
The Semantic Web vision
6. ⢠Standardised knowledge representation
and query languages: OWL, RDF, SPARQL
⢠Web of Data: huge amount of distributed,
interlinked and formally represented data
(linked data) -> related to the âopen
dataâ initiative
6
todayâŚ
7. 7
Semantic Web standard knowledge
representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI
objects can also be literals
triples are connected when they share an entity
the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
âParisâ
dbpedia-owl:country
rdfs:label
*dbpedia: <http://dbpedia.org/>
8. 7
Semantic Web standard knowledge
representation languages
The RDF data model is based on triples
Each entity has a URI that identifies it in the global web space
subjects and predicates always have a URI
objects can also be literals
triples are connected when they share an entity
the result is an interconnected graph of triples, i.e. linked data
dbpedia:Paris*
dbpedia:France
âParisâ
dbpedia-owl:country
http://www.france.fr
foaf:homepage
rdfs:label
*dbpedia: <http://dbpedia.org/>
11. ⢠Most of the Web of Data derived from
structured data (typically databases) or semi-
structured data (e.g. Wikipedia infoboxes)
⢠Web content is mostly natural language text
(web sites, news, forums, reviews, etc.)
⢠Such content is highly valuable for the
Semantic Web (question answering, opinion
mining, knowledge summarisation, etc.)
9
Limits and motivation
12. To extract as much relevant knowledge as
possible from web textual content and
publish it in the form of
Semantic Web triples
10
Overall Goal
13. ⢠Most attention to enrich the Web of Data
by learning standard relations: e.g.,
membership (rdf:type), class taxonomy
(rdfs:subClassOf), entity linking
(owl:sameAs)
⢠What about general factual relations?
⢠e.g. part, participation, causality,
location, friendship, etc.
11
Approaches to linked data and
ontology learning
14. ⢠Distant supervision: maps textual sentences to linked data triples
⢠no support for n-ary relation
⢠bound to already defined relation types
⢠Open Information Extraction: creates resources of triples composed of text fragments
(subject, relational phrase, object)
⢠not directly reusable as linked data
Open Challenges:
-> new relation discovery (both facts and relation types)
-> methods for automatically formalising discovered knowledge
-> usable label generation for new discovered relations
12
Relation extraction
âParis is the capital city of Franceâ dbpedia:Paris dbpedia-owl:country dbpedia:France
âJohn Stigall received a Bachelor of artsâ ?
âJohn Stigall received a Bachelor of artsâ (John Stigall, received, a Bachelor of arts)
15. Open Knowledge Extraction:
⢠unsupervised -> no need of huge annotated
corpora for training
⢠open-domain -> not bound to specific domain
⢠abstractive -> model the text, use external
knowledge resources, use summarisation and
language generation techniques
⢠producing formalised output -> directly
reusable as linked data
13
Contribution
16. 14
OIE vs OKE
Subject Predicate Object Approach
John Stigall received a Bachelor of arts extractive
John Stigall received from the State
University of New York
aat Cortland
extractive
dbpedia:John_Sti
gall
myprop:receives
AcademicDegree
dbpedia:Bachelor_of_
Arts
abstractive
dbpedia:John_Sti
gall
myprop:receives
AcademicDegree
From
dbpedia:State_Univers
ity_of_New_York
abstractive
âJohn Stigall received a Bachelor of arts from
the State University of New York at Cortland â
17. 15
Hypothesis
Hyperlinks are pragmatic traces of semantic relations
between two entities
âMcCarthy often commented on world
affairs on the Usenet forumsâ
Texts surrounding hyperlinks are linguistic traces of
semantic relations between two entities
18. Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
19. Pragmatic trace: presence of a hyperlink in a web page
16
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
?
20. Linguistic trace: text surrounding a hyperlink
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
21. Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy dbpo:wikiPageWikiLink dbpedia:Lisp
(DBpedia)
dbpo:wikiPageWikiLink
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
22. Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo)
legalo:developProgrammingLanguage
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
23. Linguistic trace: text surrounding a hyperlink
dbpedia:John_McCarthy myont:developed dbpedia:Lisp
(Legalo)
legalo:developProgrammingLanguage
rdfs:subPropertyOf http://swrc.ontoware.org/ontology#develops
17
He coined the term "artificial intelligence" (AI), developed the
Lisp programming language family, significantly influenced the
design of the ALGOL programming language, popularized
timesharing, and was very influential in the early development
of AI.
John MCCarthy
?
24. âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
âMcCarthy often commented on world
affairs on the Usenet forumsâ
25. âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
âMcCarthy often commented on world
affairs on the Usenet forumsâ
âMcCarthy often commented on world
affairs on the Usenet forumsâ
26. âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
18
Hyperlinks
Hyperlinks can be either defined
by humans or automatically created by Knowledge Extraction (KE) tools
âMcCarthy often commented on world
affairs on the Usenet forumsâ
âMcCarthy often commented on world
affairs on the Usenet forumsâ
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
27. 19
What can we derive from a hyperlink?
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
âMcCarthy often commented on world
affairs on the Usenet forumsâ
28. 19
What can we derive from a hyperlink?
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
âMcCarthy often commented on world
affairs on the Usenet forumsâ
McCarthy and Usenet are
(probably) related
29. 19
What can we derive from a hyperlink?
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
âMcCarthy often commented on world
affairs on the Usenet forumsâ
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
30. 19
What can we derive from a hyperlink?
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
âMcCarthy often commented on world
affairs on the Usenet forumsâ
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
31. 19
What can we derive from a hyperlink?
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
âMcCarthy often commented on world
affairs on the Usenet forumsâ
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis
dul:associateWith
dbpedia:The_New_York_Times
32. 19
What can we derive from a hyperlink?
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.ââMcCarthy often commented on world
affairs on the Usenet forumsâ
dbpedia:John_McCarthy
dbpo:wikiPageWikiLink dbpedia:Usenet
McCarthy and Usenet are
(probably) related
The following entity pairs could be related:
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
dbpedia:Joey_Foster_Ellis
dul:associateWith
dbpedia:The_New_York_Times
âMcCarthy often commented on world
affairs on the Usenet forumsâ
Linguistic traces provides us the meaning of relevant relations
33. 20
âJoey Foster Ellis has published on
The New York Times, and The Wall
Street Journal.â
s
{(esubj, eobj)i}
Joey Foster Ellis, The New York Times
Joey Foster Ellis, The New York Times
The New York Times, The Wall Street Journal
Ďs(esubj, eobj)i Ďs(Joey Foster Ellis, The New York Times)
Natural language
sentence
Set of entity
pairs
Relevant relations
evidenced by s
34. 21
Relevant Relation Assessment
Ďs(esubj, eobj)i
Ďs(Joey Foster Ellis, The New York Times)
âJoey Foster Ellis has published on The New
York Times, and The Wall Street Journal.â
Ďs(The Wall Street Journal, The New York Times)
35. 22
Usable predicate generation
Îť ~ Îťi
, Îťi
â Î
Π⥠{Ν1, ..., Νn}
labels for Ďs(esubj, eobj)i
defined for a LOD vocabulary
publish on
Ďs(Joey Foster Ellis, The New York Times)
âJoey Foster Ellis has published
on The New York Times, and
The Wall Street Journal.â
(has) published on
(has) published on journal
(is) author for
writes for
37. A novel Open Knowledge Extraction (OKE) approach that performs
unsupervised, open domain, and abstractive knowledge extraction
from text for producing directly usable machine readable data
Several implementations of OKE: Fred, Legalo, Legalo-Wikipedia
Their evaluation performed with the help of crowdsourcing and the
creation of a gold standard, all showing high values of precision,
recall, and accuracy
24
Contribution
http://wit.istc.cnr.it/stlab-tools/legalo
http://wit.istc.cnr.it/stlab-tools/legalo/wikipedia
http://wit.istc.cnr.it/kore-dev/legalo
Developing version
http://wit.istc.cnr.it/stlab-tools/fred
http://wit.istc.cnr.it/kore-dev/fred
43. 25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing
LOD vocabularies
Relation assessment
44. 25
Method and implementation
Frame-based formal
representation of s
Label generation (extraction+abstraction)
Formalisation
Alignment to existing
LOD vocabularies
Relation assessment
Linguistic traces of
Wikipedia pagelinks
subjects and objects are given by
DBpedia pagelinks
46. 27
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
47. 27
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
48. 28
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
49. 28
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
50. 28
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
51. 28
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
52. 28
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
53. 28
Frame-based formal representation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
54. 29
Relation assessment
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
55. 29
Relation assessment
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
56. 30
Relation assessment
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
57. 30
Relation assessment
G=(V,E), V ⥠{v0 , ..., vn }, E ⥠{edge1, ..., edgen}
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
58. 30
Relation assessment
Gâ=(V,Eâ) is the undirected version of G=(V,E)
G=(V,E), V ⥠{v0 , ..., vn }, E ⥠{edge1, ..., edgen}
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
59. 30
Relation assessment
Gâ=(V,Eâ) is the undirected version of G=(V,E)
G=(V,E), V ⥠{v0 , ..., vn }, E ⥠{edge1, ..., edgen}
vi â V is the node in G representing the entity ei mentioned in s.
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
60. 30
Relation assessment
P(vsubj,vobj)=[v0,edge1,âŚ,edgen,vn], Psetsubj,obj ⥠{edge1, v1âŚ, vn-1, edgen}
Gâ=(V,Eâ) is the undirected version of G=(V,E)
G=(V,E), V ⥠{v0 , ..., vn }, E ⥠{edge1, ..., edgen}
vi â V is the node in G representing the entity ei mentioned in s.
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
61. 31
Necessary condition: Ďs(esubj , eobj ) âP(vsubj , vobj )
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
62. 31
Necessary condition: Ďs(esubj , eobj ) âP(vsubj , vobj )
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
63. 31
Necessary condition: Ďs(esubj , eobj ) âP(vsubj , vobj )
Ďs(Rico Lebrun , Michelangelo ) âP(fred:Rico_Lebrun, fred:Michelangelo)
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
64. 32
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
65. 32
Let f be a node of G such that dul:Event(f)
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
66. 32
Let f be a node of G such that dul:Event(f)
Role ⥠{Ď1, ..., Ďn}, the set of possible roles participating in f
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
67. 32
Let f be a node of G such that dul:Event(f)
Role ⥠{Ď1, ..., Ďn}, the set of possible roles participating in f
AgRole ⥠{Ďm1 , ..., Ďmm }, the set of VerbNet agentive roles
AgRole â Role
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
68. 32
Let f be a node of G such that dul:Event(f)
Role ⥠{Ď1, ..., Ďn}, the set of possible roles participating in f
AgRole ⥠{Ďm1 , ..., Ďmm }, the set of VerbNet agentive roles
AgRole â Role
Sufficient condition:
Ďs(esubj , eobj ) â
âP (vsubj , vobj ) and âf â Psetsubj,obj such that dul:Event(f) and Ď(f,vsubj)âAgRole
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
69. 33
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Relation assessment
70. 33
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Subject
Object
Relation assessment
71. 33
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Subject
Object
Relation assessment
72. 33
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Subject Object
Relation assessment
73. 33
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Subject Object
Relation assessment
74. 34
Semantic Web triples and properties generation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
75. 34
Semantic Web triples and properties generation
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
76. 35
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
vn.role:Actor1 -> âwithââ¨
vn.role:Actor2 -> âwithâ
vn.role:Beneficiary -> âforâ
vn.role:Instrument -> âwithâ
vn.role:Destination -> âtoâ
vn.role:Topic -> âaboutâ
vn.role:Source -> âfromâ
Subject
Object
77. 35
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
vn.role:Actor1 -> âwithââ¨
vn.role:Actor2 -> âwithâ
vn.role:Beneficiary -> âforâ
vn.role:Instrument -> âwithâ
vn.role:Destination -> âtoâ
vn.role:Topic -> âaboutâ
vn.role:Source -> âfromâ
Subject
Object
teach
78. 35
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
vn.role:Actor1 -> âwithââ¨
vn.role:Actor2 -> âwithâ
vn.role:Beneficiary -> âforâ
vn.role:Instrument -> âwithâ
vn.role:Destination -> âtoâ
vn.role:Topic -> âaboutâ
vn.role:Source -> âfromâ
Subject
Object
teach art
79. 35
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
vn.role:Actor1 -> âwithââ¨
vn.role:Actor2 -> âwithâ
vn.role:Beneficiary -> âforâ
vn.role:Instrument -> âwithâ
vn.role:Destination -> âtoâ
vn.role:Topic -> âaboutâ
vn.role:Source -> âfromâ
Subject
Object
legalo:teachArtAt
teach art at
80. 36
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
vn.role:Actor1 -> âwithââ¨
vn.role:Actor2 -> âwithâ
vn.role:Beneficiary -> âforâ
vn.role:Instrument -> âwithâ
vn.role:Destination -> âtoâ
vn.role:Topic -> âaboutâ
vn.role:Source -> âfromâ
Subject
Object
81. 36
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
vn.role:Actor1 -> âwithââ¨
vn.role:Actor2 -> âwithâ
vn.role:Beneficiary -> âforâ
vn.role:Instrument -> âwithâ
vn.role:Destination -> âtoâ
vn.role:Topic -> âaboutâ
vn.role:Source -> âfromâ
Subject
Object
teach
82. 36
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
vn.role:Actor1 -> âwithââ¨
vn.role:Actor2 -> âwithâ
vn.role:Beneficiary -> âforâ
vn.role:Instrument -> âwithâ
vn.role:Destination -> âtoâ
vn.role:Topic -> âaboutâ
vn.role:Source -> âfromâ
Subject
Object
legalo:teachAbout
teach about
83. 37
âRico Lebrun taught visual arts at the Chouinard Art Institute and at the
Disney Studios. He was influenced by Michelangelo and maintained a
lifelong affinity for Goya and Picasso.â
Semantic Web triples and properties generation
84. 38
âRico Lebrun taught visual arts at the
Chouinard Art Institute and at the
Disney Studios. He was influenced by
Michelangelo and maintained a lifelong
affinity for Goya and Picasso.â
Semantic Web triples and properties generation
dbpedia:Rico_Lebrun s:teachAbout dbpedia:Visual_arts .
s:teachAbout a owl:ObjectProperty ;
rdfs:subPropertyOf fred:Teach;
rdfs:domain wibi:Artist ;
rdfs:range wibi:Art ;
grounding:definedFromFormalRepresentation
fred-graph:a6705cedbf9b53d10bbcdedaa3be9791da0a9e94 ;
grounding:derivedFromLinguisticEvidence s:linguisticEvidence ;
owl:propertyChainAxiom([ owl:inverseOf s:AgentTeach ] s:TopicTeach) .
_:b2 a alignment:Cell ;
alignment:entity1 s:teachAbout ;
alignment:entity2 <http://purl.org/vocab/aiiso/schema#teaches> ;
alignment:measure "0.846"^xsd:float ;
alignment:relation "equivalence" .
domain, range, subsumption
linguistic and formal scope
alignment to existing LOD vocabularies
86. 40
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list
corpus for relation extraction developed at Google Research
Crelâextraction*: 5 datasets each dedicated to a specific relation
{âpredâ:
â/people/person/education./education/education/institution",
âsub":"/m/0g9zjv3",
âobj":"/m/0ckrjh",
âevidencesâ:
[{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov",
âsnippet":
âDmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He
started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in
many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually
creates scenic design and stage clothes for his plays.â}],
âjudgments":
[{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},
{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},
{"rater":"12022408018620867151","judgment":"yes"}]}
87. 41
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from
Crelâextraction for âattending or graduating from an institutionâ
Legalo produced always an output, either p or âno relationâ
Ceducation: random sample of 130 snippets (one entity pair) from
Crelâextraction for âobtaining a degree of educationâ
Legalo produced always an output, either p or âno relationâ
Cgeneral: random sample of 60 snippets from all Crelâextraction datasets
then broken into 186 single sentences with at least one entity pair.
Legalo produced 867 results, 262 p and 605 âno relationâ
88. 42
Hypothesis 1:
Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists
which holds between two entities, according to the content of s:
âĎ.Ďs (esubj , eobj )
Ďs(esubj, eobj)i
89. 43
Evaluation results for
Relevant Relation Assessment
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
90. 44
Hypothesis 2:
Usable predicate generation
Legalo is able to generate a usable predicate Îť
â˛
for a relevant relation Ďs between to
entities, expressed in a sentence Ⲡs: given Îť, a label generated by Legalo for Ďs, and
Îťi a label generated by a human for Ďs the following holds:
Îť ~ Îťi , Îťi â Î
Îť ~ Îťi
, Îťi
â Î
Π⥠{Ν1, ..., Νn}
labels for Ďs(esubj, eobj)i
defined for a LOD vocabulary
91. 45
Evaluation results for
Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
92. ⢠Similarity score between human created {Νi} and Legalo
Îť, for a Ďs(esubj, eobj)i
⢠Jaccard distance measure (string similarity)
⢠Semantic similarity measure based on the SimLibrary
framework*
46
Evaluation results for
Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. PirrĂł and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
94. ⢠Open Knowledge Extraction: unsupervised, open-
domain, abstractive, producing OWL/RDF output
⢠OKE for relation extraction and its
implementations
⢠An evaluation showing high performance for both
tasks: relevant relation assessment, usable
predicate generation
48
Conclusion
95. 49
Where are we heading?
⢠Improving frequent-subgraph heuristics
⢠Strategies for alignment to existing LOD
vocabularies (or to ODPs?)
⢠Reconciliation of properties, out of sentence scope
⢠Using Legalo for abstractive summarisation tasks
⢠Using Legalo for language generation
⢠Identifying ways to directly compare to other
approaches
97. 51
Evaluation sample
*https://code.google.com/p/relation-extraction-corpus/downloads/list
corpus for relation extraction developed at Google Research
Crelâextraction*: 5 datasets each dedicated to a specific relation
{âpredâ:
â/people/person/education./education/education/institution",
âsub":"/m/0g9zjv3",
âobj":"/m/0ckrjh",
âevidencesâ:
[{"url":"http://en.wikipedia.org/wiki/Dmitry_Chernyakov",
âsnippet":
âDmitry was born in Moscow. In 1993 he graduated from Russian Academy of Theatre Arts as stage director. He
started his career in the Russian Drama Theatre of Lithuania in Vilnius. Then he directed opera and drama in
many major Russian cities: Moscow, Saint Petersburg, Novosibirsk, Omsk, Samara, Kazan and others. He usually
creates scenic design and stage clothes for his plays.â}],
âjudgments":
[{"rater":"1264223381988340244","judgment":"yes"},{"rater":"6968726908160095830","judgment":"yes"},
{"rater":"6063741883945424276","judgment":"yes"},{"rater":"5346153820624061638","judgment":"yes"},
{"rater":"12022408018620867151","judgment":"yes"}]}
98. 52
Evaluation sample
Cinstitution: random sample of 130 snippets (one entity pair) from
Crelâextraction for âattending or graduating from an institutionâ
Legalo produced always an output, either p or âno relationâ
Ceducation: random sample of 130 snippets (one entity pair) from
Crelâextraction for âobtaining a degree of educationâ
Legalo produced always an output, either p or âno relationâ
Cgeneral: random sample of 60 snippets from all Crelâextraction datasets
then broken into 186 single sentences with at least one entity pair.
Legalo produced 867 results, 262 p and 605 âno relationâ
99. 53
Hypothesis 1:
Relevant relation assessment
Legalo is able to assess if, given a sentence s, a relevant relation exists
which holds between two entities, according to the content of s:
âĎ.Ďs (esubj , eobj )
Ďs(esubj, eobj)i
100. 54
Crowdsourcing tasks for
Relevant Relation Assessment
Task 1:
Assessing if a sentence s is an evidence of a referenced relation (i.e.
either âinstitutionâ or âeducationâ) between two entities, mentioned in s.
Based on data from Cinstitution and Ceducation, respectively
Task 2:
Assessing if a sentence s is an evidence of any relation between two given
entities mentioned in s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be âyesâ or ânoâ
101. 55
Evaluation results for
Relevant Relation Assessment
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
102. 56
Hypothesis 2:
Usable predicate generation
Legalo is able to generate a usable predicate Îť
â˛
for a relevant relation Ďs between to
entities, expressed in a sentence Ⲡs: given Îť, a label generated by Legalo for Ďs, and
Îťi a label generated by a human for Ďs the following holds:
Îť ~ Îťi , Îťi â Î
Îť ~ Îťi
, Îťi
â Î
Π⥠{Ν1, ..., Νn}
labels for Ďs(esubj, eobj)i
defined for a LOD vocabulary
103. 57
Crowdsourcing tasks for
Usable predicate generation
Task 3:
Judging if Îť generated by a machine expresses is a good summarisation of a
specific relation (i.e. either âinstitutionâ or âeducationâ) between two given
entities mentioned in s, according to the evidence provided by s.
Based on data from Cinstitution and Ceducation, respectively
Task 4:
judging if a predicate Îť is a good summarisation of a relation expressed by
the content of s, between two given entities mentioned in s, according to the
evidence provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.70 Answer could be Agree, Partly Agree, Disagree
104. 58
Evaluation results for
Usable Predicate Generation
Agree = 1, Partly Agree = 0.5, Disagree = 0
The confidence value expresses a weighted value
for inter-rater agreement, by rater trust scores
105. 59
Crowdsourcing tasks for
Usable predicate generation
Task 5:
Creating a phrase Îť that summarises the relation expressed by the content of
s, between two given entities mentioned in s, according to the evidence
provided by s.
Based on data from Cgeneral
At least 3 raters with t>0.60 Answer was open
106. ⢠Similarity score between human created {Νi} and Legalo
Îť, for a Ďs(esubj, eobj)i
⢠Jaccard distance measure (string similarity)
⢠Semantic similarity measure based on the SimLibrary
framework*
60
Evaluation results for
Usable Predicate Generation
Task Relation Jaccard [0,1] SimLibrary [0,1] Confidence
5 Any 0.63 0.80 0.59
* G. PirrĂł and J. Euzenat. A feature and information theoretic framework for semantic similarity and relatedness. ISWC2010
108. 62
Confidence score
Given a task unit u,
a set of possible judgements {ji},
with i = 1, ...n,
{tk} a set of trust scores
each representing a rater,
with k=1,âŚ,m
tsum=sumk=1,âŚm(tk)
is the sum of trust scores of raters
giving judgement on u
trust(ji)
is the sum of tk values of raters
that choose judgement ji
confidence(ji, u) for judgement ji
on the task unit u is
confidence(âyesâ, 582275117) =
= (0.95+0.98)/2.82 = 0.68
confidence(ânoâ, 582275117) =
= (0.89/2.82 = 0.31