SlideShare una empresa de Scribd logo
1 de 193
André Freitas and Christina Unger
Tutorial @ ESWC 2014
 Understand the changes on the database landscape
in the direction of more heterogeneous data
scenarios, especially the Semantic Web.
 Understand how Question Answering (QA) fits into
this new scenario.
 Give the fundamental pointers for you to develop
your own QA system from the state-of-the-art.
 Focus on coverage.
2
 Motivation & Context
 Big Data, Linked Data & QA
 Challenges for QA over LD
 The Anatomy of a QA System
 QA over Linked Data (Case Studies)
 Evaluation of QA over Linked Data
 Do-it-yourself (DIY): Core Resources
 Trends
 Take-away Message
4
5
Motivation &
Context
 Humans are built-in
with natural language
communication
capabilities.
 Very natural way for
humans to
communicate
information needs.
 The archetypal AI
system.
6
 A research field on its own.
 Empirical bias: Focus on the development and
evaluation of approaches and systems to
answer questions over a knowledge base.
 Multidisciplinary:
◦ Natural Language Processing
◦ Information Retrieval
◦ Knowledge Representation
◦ Databases
◦ Linguistics
◦ Artificial Intelligence
◦ Software Engineering
◦ ...
7
8
QA
System
Knowledge
Bases
Question: Who is the
daughter of Bill Clinton
married to?
Answer: Marc
Mezvinsky
 From the QA expert perspective
◦ QA depends on mastering different semantic computing
techniques.
9
 Keyword Search:
◦ User still carries the major efforts in interpreting the data.
◦ Satisfying information needs may depend on multiple
search operations.
◦ Answer-driven information access.
◦ Input: Keyword search
 Typically specification of simpler information needs.
◦ Output: documents, structured data.
 QA:
◦ Delegates more ‘interpretation effort’ to the machines.
◦ Query-driven information access.
◦ Input: natural language query
 Specification of complex information needs.
◦ Output: direct answer.
10
 Structured Queries:
◦ A priori user effort in understanding the schemas behind
databases.
◦ Effort in mastering the syntax of a query language.
◦ Satisfying information needs may depend on multiple
querying operations.
◦ Input: Structured query
◦ Output: data records, aggregations, etc
 QA:
◦ Delegates more ‘semantic interpretation effort’ to the
machine.
◦ Input: natural language query
◦ Output: direct natural language answer
11
 Keyword search:
◦ Simple information needs.
◦ Predictable search behavior.
◦ Vocabulary redundancy (large document collections,
Web).
 Structured queries:
◦ Demand for absolute precision/recall guarantees.
◦ Small & centralized schemas.
◦ More data volume/less semantic heterogeneity.
 QA:
◦ Specification of complex information needs.
◦ Heterogeneous and schema-less data.
◦ More automated semantic interpretation.
12
13
14
15
16
 QA is usually associated with the delegation of
more of the ‘interpretation effort’ to the
machines.
 QA supports the specification of more complex
information needs.
 QA, Information Retrieval and Databases are
complementary perspectives.
18
19
Big Data,
Linked Data &
QA
 Big Data: More complete data-based picture
of the world.
20
 Heterogeneous, complex and large-scale databases.
 Very-large and dynamic “schemas”.
21
10s-100s attributes
1,000s-1,000,000s attributes
circa 2000
circa 2013
 Multiple perspectives (conceptualizations) of the reality.
 Ambiguity, vagueness, inconsistency.
22
 Franklin et al. (2005): From Databases to
Dataspaces.
 Helland (2011): If You Have Too Much Data,
then “Good Enough” Is Good Enough.
 Fundamental trends:
◦ Co-existence of heterogeneous data.
◦ Semantic best-effort behavior.
◦ Pay-as-you go data integration.
◦ Co-existing query/search services.
23
24
Gantz & Reinsel, The Digital Universe in 2020 (2012).
25
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
 Decentralization of content generation.
26
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
 Volume
 Velocity
 Variety
27
- The most interesting but usually neglected
dimension.
 The Semantic Web Vision:
Software which is able to
understand meaning
(intelligent, flexible).
 QA as a way to support this
vision.
28
From which university did the wife of
Barack Obama graduate?
 With Linked Data we are still in the DB world
29
 With Linked Data we are still in the DB world
 (but slightly worse)
30
 Addresses practical problems of data
accessibility in a data heterogeneity
scenario.
 A fundamental part of the original
Semantic Web vision.
31
32
Challenges for QA
over Linked Data
33
Example: What is the currency of the Czech Republic?
SELECT DISTINCT ?uri WHERE {
res:Czech_Republic dbo:currency ?uri .
}
Main challenges:
 Mapping natural language expressions to
vocabulary elements (accounting for lexical and
structural differences).
 Handling meaning variations (e.g. ambiguous or
vague expressions, anaphoric expressions).
34
 URIs are language independent identifiers.
 Their only actual connection to natural language is by
the labels that are attached to them.
dbo:spouse rdfs:label “spouse”@en , “echtgenoot”@nl .
 Labels, however, do not capture lexical variation:
wife of
husband of
married to
...
35
Which Greek cities have more than 1 million inhabitants?
SELECT DISTINCT ?uri
WHERE {
?uri rdf:type dbo:City .
?uri dbo:country res:Greece .
?uri dbo:populationTotal ?p .
FILTER (?p > 1000000)
}
36
 Often the conceptual granularity of language does
not coincide with that of the data schema.
When did Germany join the EU?
SELECT DISTINCT ?date
WHERE {
res:Germany dbp:accessioneudate ?date .
}
Who are the grandchildren of Bruce Lee?
SELECT DISTINCT ?uri
WHERE {
res:Bruce_Lee dbo:child ?c .
?c dbo:child ?uri .
}
37
 In addition, there are expressions with a fixed,
dataset-independent meaning.
Who produced the most films?
SELECT DISTINCT ?uri
WHERE {
?x rdf:type dbo:Film .
?x dbo:producer ?uri .
}
ORDER BY DESC(COUNT(?x))
OFFSET 0 LIMIT 1
38
 Different datasets usually follow different schemas,
thus provide different ways of answering an
information need.
Example:
39
 The meaning of expressions like the verbs to be, to
have, and prepositions of, with, etc. strongly
depends on the linguistic context.
Which museum has the most paintings?
?museum dbo:exhibits ?painting .
Which country has the most caves?
?cave dbo:location ?country .
40
 The number of non-English actors on the web is
growing substantially.
◦ Accessing data.
◦ Creating and publishing data.
Semantic Web:
 In principle very well suited for multilinguality, as URIs are
language-independent.
 But adding multilingual labels is not common practice (less
than a quarter of the RDF literals have language tags, and
most of those tags are in English).
41
 Requirement: Completeness and accuracy
(Wrong answers are worse than no answers)
In the context of the Semantic Web:
 QA systems need to deal with heterogeneous
and imperfect data.
◦ Datasets are often incomplete.
◦ Different datasets sometimes contain duplicate information,
often using different vocabularies even when talking about
the same things.
◦ Datasets can also contain conflicting information and
inconsistencies.
42
 Data is distributed among a large collection
of interconnected datasets.
Example: What are side effects of drugs used for the
treatment of Tuberculosis?
SELECT DISTINCT ?x
WHERE {
disease:1154 diseasome:possibleDrug ?d1.
?d1 a drugbank:drugs .
?d1 owl:sameAs ?d2.
?d2 sider:sideEffect ?x.
}
43
 Requirement: Real-time answers, i.e. low
processing time.
In the context of the Semantic Web:
 Datasets are huge.
◦ There are a lot of distributed datasets that might be
relevant for answering the question.
◦ Reported performance of current QA systems
amounts to ~20-30 seconds per question (on one
dataset).
44
 Bridge the gap between natural languages and
data.
 Deal with incomplete, noisy and heterogeneous
datasets.
 Scale to a large number of huge datasets.
 Use distributed and interlinked datasets.
 Integrate structured and unstructured data.
 Low maintainability costs (easily adaptable to
new datasets and domains).
45
The Anatomy of a
QA System
46
 Natural Language Interfaces (NLI)
◦ Input: Natural language queries
◦ Output: Either processed or unprocessed answers
 Processed: Direct answers (QA).
 Unprocessed: Database records, text snippets,
documents.
47
NLI
QA
 Categorization of questions and answers.
 Important for:
◦ Understanding the challenges before attacking the
problem.
◦ Scoping the QA system.
 Based on:
◦ Chin-Yew Lin: Question Answering.
◦ Farah Benamara: Question Answering Systems: State of
the Art and Future Directions.
48
 The part of the question that says what is
being asked:
◦ Wh-words:
 who, what, which, when, where, why, and how
◦ Wh-words + nouns, adjectives or adverbs:
 “which party …”, “which actress …”, “how long …”, “how tall
…”.
49
 Useful for distinguishing different processing
strategies
◦ FACTOID:
 PREDICATIVE QUESTIONS:
 “Who was the first man in space?”
 “What is the highest mountain in Korea?”
 “How far is Earth from Mars?”
 “When did the Jurassic Period end?”
 “Where is Taj Mahal?”
 LIST:
 “Give me all cities in Germany.”
 SUPERLATIVE:
 “What is the highest mountain?”
 YES-NO:
 “Was Margaret Thatcher a chemist?”
50
 Useful for distinguishing different processing
strategies
◦ OPINION:
 “What do most Americans think of gun control?”
◦ CAUSE & EFFECT:
 “What is the most frequent cause for lung cancer?”
◦ PROCESS:
 “How do I make a cheese cake?”
◦ EXPLANATION & JUSTIFICATION:
 “Why did the revenue of IBM drop?”
◦ ASSOCIATION QUESTION:
 “What is the connection between Barack Obama and
Indonesia?”
◦ EVALUATIVE OR COMPARATIVE QUESTIONS:
 “What is the difference between impressionism and
expressionism?”
51
The class of object sought by the question:
 Abbreviation
 Entity: event, color, animal, plant,. . .
 Description, Explanation & Justification :
definition, manner, reason,. . . (“How, why …”)
 Human: group, individual,. . . (“Who …”)
 Location: city, country, mountain,. . . ( “Where …”)
 Numeric: count, distance, size,. . . (“How many
how far, how long …”)
 Temporal: date, time, …(from “When …”)
52
 Question focus is the property or entity that is
being sought by the question
◦ “In which city was Barack Obama born?”
◦ “What is the population of Galway?”
 Question topic: What the question is generally
about :
◦ “What is the height of Mount Everest?”
 (geography, mountains)
◦ “Which organ is affected by the Meniere’s disease?”
 (medicine)
53
 Structure level:
◦ Structured data (databases).
◦ Semi-structured data (e.g. XML).
◦ Unstructured data.
 Data source:
◦ Single dataset (centralized).
◦ Enumerated list of multiple, distributed datasets.
◦ Web-scale.
54
 Domain Scope:
◦ Open Domain
◦ Domain specific
 Data Type:
◦ Text
◦ Image
◦ Sound
◦ Video
 Multi-modal QA
55
 Long answers
◦ Definition/justification based.
 Short answers
◦ Phrases.
 Exact answers
◦ Named entities, numbers, aggregate, yes/no.
56
 Relevance: The level in which the answer addresses
users information needs.
 Correctness: The level in which the answer is
factually correct.
 Conciseness: The answer should not contain
irrelevant information.
 Completeness: The answer should be complete.
 Simplicity: The answer should be easy to interpret.
 Justification: Sufficient context should be provided
to support the data consumer in the determination
of the query correctness.
57
 Right: The answer is correct and complete.
 Inexact: The answer is incomplete or incorrect.
 Unsupported: The answer does not have an
appropriate evidence/justification.
 Wrong: The answer is not appropriate for the
question.
58
 Simple Extraction: Direct extraction of snippets
from the original document(s) / data records.
 Combination: Combines excerpts from multiple
sentences, documents / multiple data records,
databases.
 Summarization: Synthesis from large texts /
data collections.
 Operational/functional: Depends on the
application of functional operators.
 Reasoning: Depends on the application of an
inference process over the original data.
59
 Semantic Tractability (Popescu et al., 2003):
Vocabulary overlap/distance between the query
and the answer.
 Answer Locality (Webber et al., 2003): Whether
answer fragments are distributed across different
document fragments / documents or
datasets/dataset records.
 Derivability (Webber et al, 2003): Dependent if the
answer is explicit or implicit. Level of reasoning
dependency.
 Semantic Complexity: Level of ambiguity and
discourse/data heterogeneity.
60
61
 Data pre-processing: Pre-processes the database data (includes
indexing, data cleaning, feature extraction).
 Question Analysis: Performs syntactic analysis and detects/extracts
the core features of the question (NER, answer type, etc).
 Data Matching: Matches terms in the question to entities in the data.
 Query Constuction: Generates structured query candidates
considering the question-data mappings and the syntactic
constraints in the query and in the database.
 Scoring: Data matching and the query construction components
output several candidates that need to be scored and ranked
according to certain criteria.
 Answer Retrieval & Extraction: Executes the query and extracts the
natural language answer from the result set.
62
64
QA over Linked Data
(Case Studies)
 ORAKEL & Pythia (Cimiano et al, 2007; Unger &
Cimiano, 2011)
◦ Ontology-specific question answering
 TBSL (Unger et al., 2012)
◦ Template-based question answering
 Aqualog & PowerAqua (Lopez et al. 2006)
◦ Querying on a Semantic Web scale
 Treo (Freitas et al. 2011, 2014)
◦ Schema-agnostic querying using distributional
semantics
 IBM Watson (Ferrucci et al., 2010)
◦ Large-scale evidence-based model for QA
65
 QuestIO & Freya (Damljanovic et al. 2010)
 QAKIS (Cabrio et al. 2012)
 Yahya et al., 2013
66
67
ORAKEL (Cimiano et al, 2007) & Pythia
(Unger & Cimiano, 2011)
Ontology-specific question answering
 Key contributions:
◦ Using ontologies to interpret user questions
◦ Relies on a deep linguistic analysis that returns
semantic representations aligned to the
ontology vocabulary and structure
 Evaluation: Geobase
68
69
 Ontology-based QA:
◦ Ontologies play a central role in interpreting user questions
◦ Output is a meaning representation that is aligned to the
ontology underlying the dataset that is queried
◦ ontological knowledge is used for drawing inferences, e.g. for
resolving ambiguities
 Grammar-based QA:
◦ Rely on linguistic grammars that assign a syntactic and
semantic representation to lexical units
◦ Advantage: can deal with questions of arbitrary complexity
◦ Drawback: brittleness (fail if question cannot be parsed
because expressions or constructs are not covered by the
grammar)
70
 Ontology-independent entries
◦ mostly function words
 quantifiers (some, every, two)
 wh-words (who, when, where, which, how many)
 negation (not)
◦ manually specified and re-usable for all domains
 Ontology-specific entries
◦ content words and phrases corresponding to
concepts and properties in the ontology
◦ automatically generated from an ontology lexicon
 Aim: capture rich and structured linguistic
information about how ontology elements are
lexicalized in a particular language
lemon (Lexicon Model for Ontologies)
http://lemon-model.net
◦ meta-model for describing ontology lexica with
RDF
◦ declarative (abstracting from specific syntactic
and semantic theories)
◦ separation of lexicon and ontology
71
 Semantics by reference:
◦ The meaning of lexical entries is specified by
pointing to elements in the ontology.
 Example:
72
73
 Which cities have more than three universities?
SELECT DISTINCT ?x WHERE {
?x rdf:type dbo:City .
?y rdf:type dbo:University .
?y dbo:city ?x .
}
GROUP BY ?y
HAVING (COUNT(?y) > 3)
74
75
TBSL (Unger et al., 2012)
Template-based question answering
 Key contributions:
◦ Constructs a query template that directly
mirrors the linguistic structure of the question
◦ Instantiates the template by matching natural
language expressions with ontology concepts
 Evaluation: QALD 2012
76
 In order to understand a user question, we
need to understand:
The words (dataset-specific)
Abraham Lincoln → res:Abraham Lincoln
died in → dbo:deathPlace
The semantic structure (dataset-independent)
who → SELECT ?x WHERE { … }
the most N → ORDER BY DESC(COUNT(?N)) LIMIT 1
more than i N → HAVING COUNT(?N) > i
77
 Goal: An approach that combines both an
analysis of the semantic structure and a mapping
of words to URIs.
 Two-step approach:
◦ 1. Template generation
 Parse question to produce a SPARQL template that directly
mirrors the structure of the question, including filters and
aggregation operations.
◦ 2. Template instantiation
 Instantiate SPARQL template by matching natural language
expressions with ontology concepts using statistical entity
identification and predicate detection.
78
SPARQL template:
SELECT DISTINCT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .
}
ORDER BY DESC(COUNT(?y))
OFFSET 0 LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
Instantiations:
?c = <http://dbpedia.org/ontology/Film>
?p = <http://dbpedia.org/ontology/producer>
79
80
81
 1. Natural language question is tagged with part-of-speech
information.
 2. Based on POS tags, grammar entries are built on the fly.
◦ Grammar entries are pairs of:
 tree structures (Lexicalized Tree Adjoining Grammar)
 semantic representations (ext. Discourse Representation
Structures)
 3. These lexical entries, together with domain-independent
lexical entries, are used for parsing the question (cf. Pythia).
 4. The resulting semantic representation is translated into a
SPARQL template.
82
 Domain-independent: who, the most
 Domain-dependent: produced/VBD, films/NNS
SPARQL template 1:
SELECT DISTINCT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
SPARQL template 2:
SELECT DISTINCT ?x WHERE {
?x ?p ?y .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
?p PROPERTY [films]
83
84
 1. For resources and classes, a generic approach to
entity detection is applied:
◦ Identify synonyms of the label using WordNet.
◦ Retrieve entities with a label similar to the slot label based
on string similarities (trigram, Levenshtein and substring
similarity).
 2. For property labels, the label is additionally
compared to natural language expressions stored in
the BOA pattern library.
 3. The highest ranking entities are returned as
candidates for filling the query slots.
85
?c CLASS [films]
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/FilmFestival>
...
?p PROPERTY [produced]
<http://dbpedia.org/ontology/producer>
<http://dbpedia.org/property/producer>
<http:// dbpedia.org/ontology/wineProduced>
86
87
 1. Every entity receives a score considering
string similarity and prominence.
 2. The score of a query is then computed as
the average of the scores of the entities used
to fill its slots.
 3. In addition, type checks are performed:
◦ For all triples ?x rdf:type <class>, all query triples ?x p e
and e p ?x are checked w.r.t. whether domain/range of p is
consistent with <class>.
 4. Of the remaining queries, the one with
highest score that returns a result is chosen.
88
SELECT DISTINCT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/Film> .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.76
SELECT DISTINCT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.60
89
 The created template structure does not
always coincide with how the data is actually
modelled.
 Considering all possibilities of how the data
could be modelled leads to a big amount of
templates (and even more queries) for one
question.
90
91
Aqualog & PowerAqua (Lopez et al. 2006)
Querying on a Semantic Web scale
 Key contributions:
◦ Pioneer work on the QA over Semantic Web
data.
◦ Semantic similarity mapping.
 Terminological Matching:
◦ WordNet-based
◦ Ontology Based
◦ String similarity
◦ Sense-based similarity matcher
 Evaluation: QALD (2011).
 Extends the AquaLog system.
92
93
 Two words are strongly similar if any of the
following holds:
◦ 1. They have a synset in common (e.g. “human” and
“person”)
◦ 2. A word is a hypernym/hyponym in the taxonomy of
the other word.
◦ 3. If there exists an allowable “is-a” path connecting a
synset associated with each word.
◦ 4. If any of the previous cases is true and the
definition (gloss) of one of the synsets of the word (or
its direct hypernyms/hyponyms) includes the other
word as one of its synonyms, we said that they are
highly similar.
94
Lopez et al. 2006
95
96
Treo (Freitas et al. 2011, 2014)
Schema-agnostic querying using
distributional semantics
 Key contributions:
◦ Distributional semantic relatedness matching
model.
◦ Compositional-distributional model for QA.
 Terminological Matching:
◦ Explicit Semantic Analysis (ESA)
◦ String similarity + node cardinality
 Evaluation: QALD (2011)
97
Treo (Irish): Direction
98
99
10
0
10
1
10
2
• Most semantic models have dealt with particular types of
constructions, and have been carried out under very
simplifying assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that
modern semantics can give a full account of all but the
simplest models/statements.
Sahlgren, 2013
Formal World Real World
10
3
Baroni et al. 2013
“Words occurring in similar (linguistic) contexts
are semantically related.”
 If we can equate meaning with context, we can
simply record the contexts in which a word occurs
in a collection of texts (a corpus).
 This can then be used as a surrogate of its
semantic representation.
104
c1
child
husband
spouse
cn
c2
function (number of times that the words occur in c1)
0.7
0.5
Commonsense is here
105
θ
c1
child
husband
spouse
cn
c2
Works as a semantic ranking function
106
Query Planner
Ƭ-Space
Large-scale
unstructured data
Commonsense
knowledge
Database
Distributional
semantics
Core semantic approximation
& composition operations
Query AnalysisQuery Query Features
Query Plan
107
Query Planner
Ƭ-Space
Wikipedia
Commonsense
knowledge
RDF
Explicit Semantic
Analysis
Core semantic approximation
& composition operations
Query AnalysisQuery Query Features
Query Plan
108
e
p
r
109
The vector space is
segmented by the
instances
110
Query
111
Search &
Composition
Operations
Query
112
 Instance search
◦ Proper nouns
◦ String similarity + node cardinality
 Class (unary predicate) search
◦ Nouns, adjectives and adverbs
◦ String similarity + Distributional semantic relatedness
 Property (binary predicate) search
◦ Nouns, adjectives, verbs and adverbs
◦ Distributional semantic relatedness
 Navigation
 Extensional expansion
◦ Expands the instances associated with a class.
 Operator application
◦ Aggregations, conditionals, ordering, position
 Disjunction & Conjunction
 Disambiguation dialog (instance, predicate)
113
 Minimize the impact of Ambiguity, Vagueness,
Synonymy.
 Address the simplest matchings first (heuristics).
 Semantic Relatedness as a primitive operation.
 Distributional semantics as commonsense
knowledge.
114
 Transform natural language queries into
triple patterns.
“Who is the daughter of Bill Clinton married to?”
115
 Step 1: POS Tagging
◦ Who/WP
◦ is/VBZ
◦ the/DT
◦ daughter/NN
◦ of/IN
◦ Bill/NNP
◦ Clinton/NNP
◦ married/VBN
◦ to/TO
◦ ?/.
116
 Step 2: Core Entity Recognition
◦ Rules-based: POS Tag + TF/IDF
Who is the daughter of Bill Clinton married to?
(PROBABLY AN INSTANCE)
117
 Step 3: Determine answer type
◦ Rules-based.
Who is the daughter of Bill Clinton married to?
(PERSON)
118
 Step 4: Dependency parsing
◦ dep(married-8, Who-1)
◦ auxpass(married-8, is-2)
◦ det(daughter-4, the-3)
◦ nsubjpass(married-8, daughter-4)
◦ prep(daughter-4, of-5)
◦ nn(Clinton-7, Bill-6)
◦ pobj(of-5, Clinton-7)
◦ root(ROOT-0, married-8)
◦ xcomp(married-8, to-9)
119
 Step 5: Determine Partial Ordered Dependency
Structure (PODS)
◦ Rules based.
 Remove stop words.
 Merge words into entities.
 Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
ANSWER
TYPE
QUESTION FOCUS
120
 Step 5: Determine Partial Ordered Dependency
Structure (PODS)
◦ Rules based.
 Remove stop words.
 Merge words into entities.
 Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
(PREDICATE) (PREDICATE) Query Features
121
 Map query features into a query plan.
 A query plan contains a sequence of:
◦ Search operations.
◦ Navigation operations.
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
 (1) INSTANCE SEARCH (Bill Clinton)
 (2) DISAMBIGUATE ENTITY TYPE
 (3) GENERATE ENTITY FACETS
 (4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)
 (5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)
 (6) p2 <- SEARCH RELATED PREDICATE (e1, married to)
 (7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)
 (8) POST PROCESS (Bill Clintion, e1, p1, e2, p2)
Query Plan
122
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
Entity Search
123
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists:religion
:Yale_Law_School
:almaMater
...
(PIVOT ENTITY)
(ASSOCIATED
TRIPLES)
124
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Baptists:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
sem_rel(daughter,alma mater)=0.001
Which properties are semantically related to ‘daughter’?
125
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
126
 Computation of a measure of “semantic
proximity” between two terms.
 Allows a semantic approximate matching
between query terms and dataset terms.
 It supports a commonsense reasoning-like
behavior based on the knowledge embedded
in the corpus.
127
 Use distributional semantics to semantically match
query terms to predicates and classes.
 Distributional principle: Words that co-occur
together tend to have related meaning.
◦ Allows the creation of a comprehensive semantic model
from unstructured text.
◦ Based on statistical patterns over large amounts of text.
◦ No human annotations.
 Distributional semantics can be used to compute a
semantic relatedness measure between two words.
128
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
(PIVOT ENTITY)
129
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked
Data:
:Chelsea_Clinton
:child
:Mark_Mezvinsky
:spouse
130
133
134
135
136
What is the highest mountain?
(CLASS) (OPERATOR) Query Features
mountain - highest PODS
137
Mountain highest
:Mountain
Query:
Linked
Data:
:typeOf
(PIVOT ENTITY)
138
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
139
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:location
...
:deathPlaceOf
140
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
141
Mountain highest
:Mountain
Query:
Linked
Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
SORT
TOP_MOST
142
143
 Semantic approximation in databases (as in
any IR system): semantic best-effort.
 Need some level of user disambiguation,
refinement and feedback.
 As we move in the direction of semantic
systems we should expect the need for
principled dialog mechanisms (like in human
communication).
 Pull the the user interaction back into the
system.
144
145
146
147
IBM Watson (Ferrucci et al., 2010)
Large-scale evidence-based model for QA
 Key contributions:
◦ Evidence-based QA system.
◦ Complex and high performance QA Pipeline.
◦ The system uses more than 50 scoring
components that produce scores which range
from probabilities and counts to categorical
features.
◦ Major cultural impact:
 Before: QA as AI vision, academic exercise.
 After: QA as an attainable software architecture in
the short term.
 Evaluation: Jeopardy! Challenge
148
Ferrucci et al. 2010
149
 Question analysis: includes shallow and deep
parsing, extraction of logical forms, semantic role
labelling, coreference resolution, relations
extraction, named entity recognition, among
others.
 Question decomposition: decomposition of the
question into separate phrases, which will generate
constraints that need to be satisfied by evidence
from the data.
150
Ferrucci et al. 2010
Question
Question
& Topic
Analysis
Question
Decomposition
151
 Hypothesis generation:
◦ Primary search
 Document and passage retrieval
 SPARQL queries are used over triple stores.
◦ Candidate answer generation (maximizing recall).
 Information extraction techniques are applied to the
search results to generate candidate answers.
 Soft filtering: Application of lightweight (less
resource intensive) scoring algorithms to a larger
set of initial candidates to prune the list of
candidates before the more intensive scoring
components.
152
Ferrucci et al. 2010
Question
Primary
Search
Candidate
Answer
Generation
Hypothesis
Generation
Answer
Sources
Question
& Topic
Analysis
Question
Decomposition
Hypothesis
Generation
Hypothesis and Evidence
Scoring
153
 Hypothesis and evidence scoring:
◦ Supporting evidence retrieval
 Seeks additional evidence for each candidate answer from
the data sources while the deep evidence scoring step
determines the degree of certainty that the retrieved
evidence supports the candidate answers.
◦ Deep evidence scoring
 Scores are then combined into an overall evidence profile
which groups individual features into aggregate evidence
dimensions.
154
Ferrucci et al. 2010
Answer
Scoring
Question
Evidence
Sources
Primary
Search
Candidate
Answer
Generation
Hypothesis
Generation
Hypothesis and
Evidence Scoring
Answer
Sources
Question
& Topic
Analysis
Question
Decomposition
Evidence
Retrieval
Deep
Evidence
Scoring
Hypothesis
Generation
Hypothesis and Evidence
Scoring
155
 Answer merging: is a step that merges answer
candidates (hypotheses) with different surface
forms but with related content, combining their
scores.
 Ranking and confidence estimation: ranks the
hypotheses and estimate their confidence based on
the scores, using machine learning approaches
over a training set. Multiple trained models cover
different question types.
156
Ferrucci et al. 2010
Farrell, 2011
Answer
Scoring
Models
Answer &
Confidence
Question
Evidence
Sources
Models
Models
Models
Models
ModelsPrimary
Search
Candidate
Answer
Generation
Hypothesis
Generation
Hypothesis and
Evidence Scoring
Final Confidence
Merging &
Ranking
Synthesis
Answer
Sources
Question
& Topic
Analysis
Question
Decomposition
Evidence
Retrieval
Deep
Evidence
Scoring
Hypothesis
Generation
Hypothesis and Evidence
Scoring
Learned Models
help combine and
weigh the Evidence
157
www.medbiq.org/sites/default/files/presentations/2011/Farrell.pp
Question
100s Possible
Answers
1000’s of
Pieces of Evidence
Multiple
Interpretations
100,000’s scores from many simultaneous
Text Analysis Algorithms100s sources
Hypothesis
Generation
Hypothesis and
Evidence Scoring
Final Confidence
Merging &
Ranking
Synthesis
Question
& Topic
Analysis
Question
Decomposition
Hypothesis
Generation
Hypothesis and Evidence
Scoring
Answer &
Confidence
Ferrucci et al. 2010,
Farrell, 2011
158 www.medbiq.org/sites/default/files/presentations/2011/Farrell.pp
UIMA for interoperability
UIMA-AS for scale-out and speed
Ferrucci et al. 2010
159
160
Evaluation of QA over
Linked Data
 Test Collection
◦ Questions
◦ Datasets
◦ Answers (Gold-standard)
 Evaluation Measures
161
 Measures how complete is the answer set.
 The fraction of relevant instances that are
retrieved.
 Which are the Jovian planets in the Solar System?
◦ Returned Answers:
 Mercury
 Jupiter
 Saturn
 Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
Recall = 2/4 = 0.5
162
 Measures how accurate is the answer set.
 The fraction of retrieved instances that are
relevant.
 Which are the Jovian planets in the Solar System?
◦ Returned Answers:
 Mercury
 Jupiter
 Saturn
 Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
Recall = 2/3 = 0.67
163
 Measures the ranking quality.
 The Reciprocal-Rank (1/r) of a query can be defined as the rank r
at which a system returns the first relevant result.
 Which are the Jovian planets in the Solar System?
 Returned Answers:
– Mercury
– Jupiter
– Saturn
 Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
rr = 1/2 = 0.5
164
 Query execution time
 Indexing time
 Index size
 Dataset adaptation effort (Indexing time)
 Semantic enrichment/disambiguation
◦ # of operations/time
 Question Answering over Linked Data
(QALD-CLEF)
 INEX Linked Data Track
 BioASQ
 SemSearch
168
169
 QALD is a series of evaluation campaigns on
question answering over linked data.
◦ QALD-1 (ESWC 2011)
◦ QALD-2 as part of the workshop
 (Interacting with Linked Data (ESWC 2012))
◦ QALD-3 (CLEF 2013)
◦ QALD-4 (CLEF 2014)
 It is aimed at all kinds of systems that mediate
between a user, expressing his or her
information need in natural language, and
semantic data.
170
 QALD-4 is part of the Question Answering
track at CLEF 2014:
http://nlp.uned.es/clef-qa/
 Tasks:
◦ 1. Multilingual question answering over DBpedia
◦ 2. Biomedical question answering on interlinked
data
◦ 3. Hybrid question answering
171
Task:
 Given a natural language question or keywords,
either retrieve the correct answer(s) from a given
RDF repository, or provide a SPARQL query that
retrieves these answer(s).
◦ Dataset: DBpedia 3.9 (with multilingual labels)
◦ Questions: 200 training + 50 test
◦ Seven languages:
 English, Spanish, German, Italian, French, Dutch,
Romanian
172
<question id = "36" answertype = "resource"
aggregation = "false"
onlydbo = "true" >
Through which countries does the Yenisei river flow?
Durch welche Länder fließt der Yenisei?
¿Por qué países fluye el río Yenisei?
...
PREFIX res: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?uri WHERE {
res:Yenisei_River dbo:country ?uri .
}
173
 Datasets: SIDER, Diseasome, Drugbank
 Questions: 25 training + 25 test
 require integration of information from different
datasets
 Example: What is the side effects of drugs used for
Tuberculosis?
SELECT DISTINCT ?x WHERE {
disease:1154 diseasome:possibleDrug ?v2 .
?v2 a drugbank:drugs .
?v3 owl:sameAs ?v2 .
?v3 sider:sideEffect ?x .
}
174
 Dataset: DBpedia 3.9 (with English abstracts)
 Questions: 25 training + 10 test
 require both structured data and free text from the
abstract to be answered
 Example: Give me the currencies of all G8 countries.
SELECT DISTINCT ?uri WHERE {
?x text:"member of" text:"G8" .
?x dbo:currency ?uri .
}
175
 Focuses on the combination of textual and structured data.
 Datasets:
◦ English Wikipedia (MediaWiki XML Format)
◦ DBpedia 3.8 & YAGO2 (RDF)
◦ Links among the Wikipedia, DBpedia 3.8, and YAGO2 URI's.
 Tasks:
◦ Ad-hoc Task: return a ranked list of results in response to a search
topic that is formulated as a keyword query (144 search topics).
◦ Jeopardy Task: Investigate retrieval techniques over a set of natural-
language Jeopardy clues (105 search topics – 74 (2012) + 31 (2013)).
https://inex.mmci.uni-saarland.de/tracks/lod/
181
182
 Focuses on entity search over Linked Datasets.
 Datasets:
◦ Sample of Linked Data crawled from publicly available
sources (based on the Billion Triple Challenge 2009).
 Tasks:
◦ Entity Search: Queries that refer to one particular
entity. Tiny sample of Yahoo! Search Query.
◦ List Search: The goal of this track is select objects that
match particular criteria. These queries have been
hand-written by the organizing committee.
http://semsearch.yahoo.com/datasets.php#
183
 List Search queries:
◦ republics of the former Yugoslavia
◦ ten ancient Greek city
◦ kingdoms of Cyprus
◦ the four of the companions of the prophet
◦ Japanese-born players who have played in MLB where
the British monarch is also head of state
◦ nations where Portuguese is an official language
◦ bishops who sat in the House of Lords
◦ Apollo astronauts who walked on the Moon
184
 Entity Search queries:
◦ 1978 cj5 jeep
◦ employment agencies w. 14th street
◦ nyc zip code
◦ waterville Maine
◦ LOS ANGELES CALIFORNIA
◦ ibm
◦ KARL BENZ
◦ MIT
185
Balog & Neumayer, A Test Collection for Entity Search in DBpedia (2013).
186
 Datasets:
◦ PubMed documents
 Tasks:
◦ 1a: Large-Scale Online Biomedical Semantic
Indexing
 Automatic annotation of PubMed documents.
 Training data is provided.
◦ 1b: Introductory Biomedical Semantic QA
 300 questions and related material (concepts, triples
and golden answers).
187
 Metrics, Statistics, Tests - Tetsuya Sakai (IR)
◦ http://www.promise-noe.eu/documents/10156/26e7f254-
1feb-4169-9204-1c53cc1fd2d7
 Building test Collections (IR Evaluation - Ian
Soboroff)
◦ http://www.promise-noe.eu/documents/10156/951b6dfb-
a404-46ce-b3bd-4bbe6b290bfd
188
189
Do-it-yourself (DIY):
Core Resources
 DBpedia
◦ http://dbpedia.org/
 YAGO
◦ http://www.mpi-inf.mpg.de/yago-naga/yago/
 Freebase
◦ http://www.freebase.com/
 Wikipedia dumps
◦ http://dumps.wikimedia.org/
 ConceptNet
◦ http:// conceptnet5.media.mit.edu/
 Common Crawl
◦ http://commoncrawl.org/
 Where to use:
◦ As a commonsense KB or as a data source
190
 High domain coverage:
◦ ~95% of Jeopardy! Answers.
◦ ~98% of TREC answers.
 Wikipedia is entity-centric.
 Curated link structure.
 Complementary tools:
◦ Wikipedia Miner
 Where to use:
◦ Construction of distributional semantic models.
◦ As a commonsense KB
191
 WordNet
◦ http://wordnet.princeton.edu/
 Wiktionary
◦ http://www.wiktionary.org/
◦ API: https://www.mediawiki.org/wiki/API:Main_page
 FrameNet
◦ https://framenet.icsi.berkeley.edu/fndrupal/
 VerbNet
◦ http://verbs.colorado.edu/~mpalmer/projects/verbnet.html
 English lexicon for DBpedia 3.8 (in the lemon format)
◦ http://lemon-model.net/lexica/dbpedia_en/
 PATTY (collection of semantically-typed relational patterns)
◦ http://www.mpi-inf.mpg.de/yago-naga/patty/
 BabelNet
◦ http://babelnet.org/
 Where to use:
◦ Query expansion
◦ Semantic similarity
◦ Semantic relatedness
◦ Word sense disambiguation
192
 Lucene & Solr
◦ http://lucene.apache.org/
 Terrier
◦ http://terrier.org/
 Where to use:
◦ Answer Retrieval
◦ Scoring
◦ Query-Data matching
193
 GATE (General Architecture for Text Engineering)
◦ http://gate.ac.uk/
 NLTK (Natural Language Toolkit)
◦ http://nltk.org/
 Stanford NLP
◦ http://www-nlp.stanford.edu/software/index.shtml
 LingPipe
◦ http://alias-i.com/lingpipe/index.html
 Where to use:
◦ Question Analysis
194
 MALT
◦ http://www.maltparser.org/
◦ Languages (pre-trained): English, French, Swedish
 Stanford parser
◦ http://nlp.stanford.edu/software/lex-parser.shtml
◦ Languages: English, German, Chinese, and others
 CHAOS
◦ http://art.uniroma2.it/external/chaosproject/
◦ Languages: English, Italian
 C&C Parser
◦ http://svn.ask.it.usyd.edu.au/trac/candc
 Where to Use:
◦ Question Analysis
195
 NERD (Named Entity Recognition and
Disambiguation)
◦ http://nerd.eurecom.fr/
 Stanford Named Entity Recognizer
◦ http://nlp.stanford.edu/software/CRF-NER.shtml
 FOX (Federated Knowledge Extraction Framework)
◦ http://fox.aksw.org
 DBpedia Spotlight
◦ http://spotlight.dbpedia.org
 Where to use:
◦ Question Analysis
◦ Query-Data Matching
196
 Wikipedia Miner
◦ http://wikipedia-miner.cms.waikato.ac.nz/
 WS4J (Java API for several semantic relatedness algorithms)
◦ https://code.google.com/p/ws4j/
 SecondString (string matching)
◦ http://secondstring.sourceforge.net
 EasyESA (distributional semantic relatedness)
◦ http://treo.deri.ie/easyESA
 Sspace (distributional semantics framework)
◦ https://github.com/fozziethebeat/S-Space
 Where to use:
◦ Query-Data matching
◦ Semantic relatedness & similiarity
◦ Word Sense Disambiguation
197
 DIRT
◦ Paraphrase Collection:
 http://aclweb.org/aclwiki/index.php?title
◦ DIRT_Paraphrase_Collection
 Demo:
 http://demo.patrickpantel.com/demos/lexsem/paraphrase.htm
 PPDB (The Paraphrase Database)
◦ http://www.cis.upenn.edu/~ccb/ppdb/
 Where to use:
◦ Query-Data matching
198
 Apache UIMA
◦ http://uima.apache.org/
 Open Advancement of Question Answering
Systems (OAQA)
◦ http://oaqa.github.io/
 OKBQA
◦ http://www.okbqa.org/documentation
https://github.com/okbqa
 Where to use:
◦ Components integration
199
200
Trends
 Querying distributed linked data
 Integration of structured and unstructured data
 User interaction and context mechanisms
 Integration of reasoning (deductive, inductive,
counterfactual, abductive ...) on QA approaches
and test collections
 Measuring confidence and answer uncertainty
 Multilinguality
 Machine Learning
 Reproducibility in QA research
201
 Big/Linked Data demand new principled semantic
approaches to cope with the scale and heterogeneity of
data.
 Part of the Semantic Web/AI vision can be addressed today
with a multi-disciplinary perspective:
◦ Linked Data, IR and NLP
 The multidiscipinarity of the QA problem can serve show
what semantic computing have achieved and can be
transported to other information system types.
 Challenges are moving from the construction of basic QA
systems to more sophisticated semantic functionalities.
 Very active research area.
202
 Unger, C., Freitas, A., Cimiano, P.,
Introduction to Question Answering for
Linked Data, Reasoning Web Summer School,
2014.
203
[1] Eifrem, A NOSQL Overview And The Benefits Of Graph Database, 2009
[2] Idehen, Understanding Linked Data via EAV Model, 2010.
[3] Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for
Casual End-users?, 2007
[4] Chin-Yew Lin, Question Answering.
[5] Farah Benamara, Question Answering Systems: State of the Art and Future Directions.
[6] Yahya et al Robust Question Answering over the Web of Linked Data, CIKM, 2013.
[7] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches
and Trends, 2012.
[8] Freitas et al., Answering Natural Language Queries over Linked Data Graphs: A Distributional
Semantics Approach,, 2014.
[9] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches
and Trends, 2012.
[10] Freitas & Curry, Natural Language Queries over Heterogeneous Linked Data Graphs: A
Distributional-Compositional Semantics Approach, IUI, 2014.
[11] Cimiano et al., Towards portable natural language interfaces to knowledge bases, 2008.
[12] Lopez et al., PowerAqua: fishing the semantic web, 2006.
[13] Damljanovic et al., Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and
Ontology-based Lookup through the User Interaction, 2010
[14] Unger et al. Template-based Question Answering over RDF Data, 2012.
[15] Cabrio et al., QAKiS: an Open Domain QA System based on Relational Patterns, 2012.
[16] How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users?, 2007.
[17] Popescu et al.,Towards a theory of natural language interfaces to databases., 2003.
[18] Farrel, IBM Watson A Brief Overview and Thoughts for Healthcare Education and Performance
Improvement
204

Más contenido relacionado

La actualidad más candente

ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"Fabien Gandon
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremonyFabien Gandon
 
RDF: Resource Description Failures?
RDF: Resource Description Failures?RDF: Resource Description Failures?
RDF: Resource Description Failures?Robert Sanderson
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemNIT Durgapur
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsVito Ostuni
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebStefan Dietze
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the WebArmin Haller
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021Fabien Gandon
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeTraian Rebedea
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebJie Bao
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word CloudsMarina Santini
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked dataLaura Po
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Andres Garcia-Silva
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerOpenSource Connections
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 

La actualidad más candente (20)

ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"ESWC 2015 Closing and "General Chair's minute of Madness"
ESWC 2015 Closing and "General Chair's minute of Madness"
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
ESWC2015 opening ceremony
ESWC2015 opening ceremonyESWC2015 opening ceremony
ESWC2015 opening ceremony
 
RDF: Resource Description Failures?
RDF: Resource Description Failures?RDF: Resource Description Failures?
RDF: Resource Description Failures?
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Development of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management SystemDevelopment of Semantic Web based Disaster Management System
Development of Semantic Web based Disaster Management System
 
Linked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender SystemsLinked Open Data to support content based Recommender Systems
Linked Open Data to support content based Recommender Systems
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
Analysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the WebAnalysing & Improving Learning Resources Markup on the Web
Analysing & Improving Learning Resources Markup on the Web
 
Knowledge graphs on the Web
Knowledge graphs on the WebKnowledge graphs on the Web
Knowledge graphs on the Web
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning ChallengeAn Evolution of Deep Learning Models for AI2 Reasoning Challenge
An Evolution of Deep Learning Models for AI2 Reasoning Challenge
 
Towards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic WebTowards Linked Ontologies and Data on the Semantic Web
Towards Linked Ontologies and Data on the Semantic Web
 
Lecture: Semantic Word Clouds
Lecture: Semantic Word CloudsLecture: Semantic Word Clouds
Lecture: Semantic Word Clouds
 
Introduction to linked data
Introduction to linked dataIntroduction to linked data
Introduction to linked data
 
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
Social Tags and Linked Data for Ontology Development: A Case Study in the Fin...
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 

Destacado

Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question AnsweringSujit Pal
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Andre Freitas
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsAndre Freitas
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering SystemDhwaj Raj
 
A Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured DataA Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured DataAndre Freitas
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Ahmed Magdy Ezzeldin, MSc.
 
openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016
openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016
openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016Ben Chou
 
Vanessa lopez linked data and search
Vanessa lopez   linked data and searchVanessa lopez   linked data and search
Vanessa lopez linked data and searchDublinked .
 
openQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering FrameworkopenQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering FrameworkEdgard Marx
 
Representing Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data GraphsRepresenting Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data GraphsAndre Freitas
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2Andre Freitas
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study Andre Freitas
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalMadhusudan Daad
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engineEchelon Institute of Technology
 

Destacado (20)

Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
 
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributio...
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering Systems
 
Instant Question Answering System
Instant Question Answering SystemInstant Question Answering System
Instant Question Answering System
 
A Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured DataA Compositional-distributional Semantic Model over Structured Data
A Compositional-distributional Semantic Model over Structured Data
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
 
openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016
openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016
openQA hands on with openSUSE Leap 42.1 - openSUSE.Asia Summit ID 2016
 
Vanessa lopez linked data and search
Vanessa lopez   linked data and searchVanessa lopez   linked data and search
Vanessa lopez linked data and search
 
openQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering FrameworkopenQA Hoverboard - Open-source Question Answering Framework
openQA Hoverboard - Open-source Question Answering Framework
 
Representing Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data GraphsRepresenting Texts as contextualized Entity Centric Linked Data Graphs
Representing Texts as contextualized Entity Centric Linked Data Graphs
 
Enterprise Knowledge Graph
Enterprise Knowledge GraphEnterprise Knowledge Graph
Enterprise Knowledge Graph
 
WiSS Challenge - Day 2
WiSS Challenge - Day 2WiSS Challenge - Day 2
WiSS Challenge - Day 2
 
WISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked DataWISS QA Do it yourself Question answering over Linked Data
WISS QA Do it yourself Question answering over Linked Data
 
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary StudyOn the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study
 
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeSchema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
 
Disambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document RetrievalDisambiguating Polysemous Queries For Document Retrieval
Disambiguating Polysemous Queries For Document Retrieval
 
Comparative study of different ranking algorithms adopted by search engine
Comparative study of  different ranking algorithms adopted by search engineComparative study of  different ranking algorithms adopted by search engine
Comparative study of different ranking algorithms adopted by search engine
 

Similar a Question Answering over Linked Data: Challenges, Approaches & Trends (Tutorial @ ESWC 2014)

Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Andre Freitas
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Text mining voor Business Intelligence toepassingen
Text mining voor Business Intelligence toepassingenText mining voor Business Intelligence toepassingen
Text mining voor Business Intelligence toepassingenjcscholtes
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationFrank van Harmelen
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science TJ Stalcup
 
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesLinked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesPrateek Jain
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data ScienceTJ Stalcup
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with RStephen Withington
 

Similar a Question Answering over Linked Data: Challenges, Approaches & Trends (Tutorial @ ESWC 2014) (20)

Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
Talking to your Data: Natural Language Interfaces for a schema-less world (Ke...
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Text mining voor Business Intelligence toepassingen
Text mining voor Business Intelligence toepassingenText mining voor Business Intelligence toepassingen
Text mining voor Business Intelligence toepassingen
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
The Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge RepresentationThe Empirical Turn in Knowledge Representation
The Empirical Turn in Knowledge Representation
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
PhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek JainPhD Proposal Defense - Prateek Jain
PhD Proposal Defense - Prateek Jain
 
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesLinked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 

Más de Andre Freitas

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAndre Freitas
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ ManchesterAndre Freitas
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...Andre Freitas
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsAndre Freitas
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...Andre Freitas
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachAndre Freitas
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...Andre Freitas
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?Andre Freitas
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackAndre Freitas
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional SemanticsAndre Freitas
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsAndre Freitas
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...Andre Freitas
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
 

Más de Andre Freitas (20)

AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & TrendsAI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
AI & Scientific Discovery in Oncology: Opportunities, Challenges & Trends
 
AI Systems @ Manchester
AI Systems @ ManchesterAI Systems @ Manchester
AI Systems @ Manchester
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs ...
 
Semantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and RefinementSemantic Relation Classification: Task Formalisation and Refinement
Semantic Relation Classification: Task Formalisation and Refinement
 
Categorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary DefinitionsCategorization of Semantic Roles for Dictionary Definitions
Categorization of Semantic Roles for Dictionary Definitions
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
How hard is this Query? Measuring the Semantic Complexity of Schema-agnostic ...
 
Semantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional ApproachSemantics at Scale: A Distributional Approach
Semantics at Scale: A Distributional Approach
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...A Semantic Web Platform for Automating the Interpretation of Finite Element ...
A Semantic Web Platform for Automating the Interpretation of Finite Element ...
 
How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?How Semantic Technologies can help to cure Hearing Loss?
How Semantic Technologies can help to cure Hearing Loss?
 
Towards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web StackTowards a Distributional Semantic Web Stack
Towards a Distributional Semantic Web Stack
 
Introduction to Distributional Semantics
Introduction to Distributional SemanticsIntroduction to Distributional Semantics
Introduction to Distributional Semantics
 
On the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category DescriptorsOn the Semantic Representation and Extraction of Complex Category Descriptors
On the Semantic Representation and Extraction of Complex Category Descriptors
 
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
A Distributional Semantics Approach for Selective Reasoning on Commonsense Gr...
 
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachCoping with Data Variety in the Big Data Era: The Semantic Computing Approach
Coping with Data Variety in the Big Data Era: The Semantic Computing Approach
 

Último

Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 

Último (20)

Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Question Answering over Linked Data: Challenges, Approaches & Trends (Tutorial @ ESWC 2014)

  • 1. André Freitas and Christina Unger Tutorial @ ESWC 2014
  • 2.  Understand the changes on the database landscape in the direction of more heterogeneous data scenarios, especially the Semantic Web.  Understand how Question Answering (QA) fits into this new scenario.  Give the fundamental pointers for you to develop your own QA system from the state-of-the-art.  Focus on coverage. 2
  • 3.  Motivation & Context  Big Data, Linked Data & QA  Challenges for QA over LD  The Anatomy of a QA System  QA over Linked Data (Case Studies)  Evaluation of QA over Linked Data  Do-it-yourself (DIY): Core Resources  Trends  Take-away Message 4
  • 5.  Humans are built-in with natural language communication capabilities.  Very natural way for humans to communicate information needs.  The archetypal AI system. 6
  • 6.  A research field on its own.  Empirical bias: Focus on the development and evaluation of approaches and systems to answer questions over a knowledge base.  Multidisciplinary: ◦ Natural Language Processing ◦ Information Retrieval ◦ Knowledge Representation ◦ Databases ◦ Linguistics ◦ Artificial Intelligence ◦ Software Engineering ◦ ... 7
  • 7. 8 QA System Knowledge Bases Question: Who is the daughter of Bill Clinton married to? Answer: Marc Mezvinsky
  • 8.  From the QA expert perspective ◦ QA depends on mastering different semantic computing techniques. 9
  • 9.  Keyword Search: ◦ User still carries the major efforts in interpreting the data. ◦ Satisfying information needs may depend on multiple search operations. ◦ Answer-driven information access. ◦ Input: Keyword search  Typically specification of simpler information needs. ◦ Output: documents, structured data.  QA: ◦ Delegates more ‘interpretation effort’ to the machines. ◦ Query-driven information access. ◦ Input: natural language query  Specification of complex information needs. ◦ Output: direct answer. 10
  • 10.  Structured Queries: ◦ A priori user effort in understanding the schemas behind databases. ◦ Effort in mastering the syntax of a query language. ◦ Satisfying information needs may depend on multiple querying operations. ◦ Input: Structured query ◦ Output: data records, aggregations, etc  QA: ◦ Delegates more ‘semantic interpretation effort’ to the machine. ◦ Input: natural language query ◦ Output: direct natural language answer 11
  • 11.  Keyword search: ◦ Simple information needs. ◦ Predictable search behavior. ◦ Vocabulary redundancy (large document collections, Web).  Structured queries: ◦ Demand for absolute precision/recall guarantees. ◦ Small & centralized schemas. ◦ More data volume/less semantic heterogeneity.  QA: ◦ Specification of complex information needs. ◦ Heterogeneous and schema-less data. ◦ More automated semantic interpretation. 12
  • 12. 13
  • 13. 14
  • 14. 15
  • 15. 16
  • 16.
  • 17.  QA is usually associated with the delegation of more of the ‘interpretation effort’ to the machines.  QA supports the specification of more complex information needs.  QA, Information Retrieval and Databases are complementary perspectives. 18
  • 19.  Big Data: More complete data-based picture of the world. 20
  • 20.  Heterogeneous, complex and large-scale databases.  Very-large and dynamic “schemas”. 21 10s-100s attributes 1,000s-1,000,000s attributes circa 2000 circa 2013
  • 21.  Multiple perspectives (conceptualizations) of the reality.  Ambiguity, vagueness, inconsistency. 22
  • 22.  Franklin et al. (2005): From Databases to Dataspaces.  Helland (2011): If You Have Too Much Data, then “Good Enough” Is Good Enough.  Fundamental trends: ◦ Co-existence of heterogeneous data. ◦ Semantic best-effort behavior. ◦ Pay-as-you go data integration. ◦ Co-existing query/search services. 23
  • 23. 24 Gantz & Reinsel, The Digital Universe in 2020 (2012).
  • 24. 25 Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
  • 25.  Decentralization of content generation. 26 Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
  • 26.  Volume  Velocity  Variety 27 - The most interesting but usually neglected dimension.
  • 27.  The Semantic Web Vision: Software which is able to understand meaning (intelligent, flexible).  QA as a way to support this vision. 28
  • 28. From which university did the wife of Barack Obama graduate?  With Linked Data we are still in the DB world 29
  • 29.  With Linked Data we are still in the DB world  (but slightly worse) 30
  • 30.  Addresses practical problems of data accessibility in a data heterogeneity scenario.  A fundamental part of the original Semantic Web vision. 31
  • 32. 33
  • 33. Example: What is the currency of the Czech Republic? SELECT DISTINCT ?uri WHERE { res:Czech_Republic dbo:currency ?uri . } Main challenges:  Mapping natural language expressions to vocabulary elements (accounting for lexical and structural differences).  Handling meaning variations (e.g. ambiguous or vague expressions, anaphoric expressions). 34
  • 34.  URIs are language independent identifiers.  Their only actual connection to natural language is by the labels that are attached to them. dbo:spouse rdfs:label “spouse”@en , “echtgenoot”@nl .  Labels, however, do not capture lexical variation: wife of husband of married to ... 35
  • 35. Which Greek cities have more than 1 million inhabitants? SELECT DISTINCT ?uri WHERE { ?uri rdf:type dbo:City . ?uri dbo:country res:Greece . ?uri dbo:populationTotal ?p . FILTER (?p > 1000000) } 36
  • 36.  Often the conceptual granularity of language does not coincide with that of the data schema. When did Germany join the EU? SELECT DISTINCT ?date WHERE { res:Germany dbp:accessioneudate ?date . } Who are the grandchildren of Bruce Lee? SELECT DISTINCT ?uri WHERE { res:Bruce_Lee dbo:child ?c . ?c dbo:child ?uri . } 37
  • 37.  In addition, there are expressions with a fixed, dataset-independent meaning. Who produced the most films? SELECT DISTINCT ?uri WHERE { ?x rdf:type dbo:Film . ?x dbo:producer ?uri . } ORDER BY DESC(COUNT(?x)) OFFSET 0 LIMIT 1 38
  • 38.  Different datasets usually follow different schemas, thus provide different ways of answering an information need. Example: 39
  • 39.  The meaning of expressions like the verbs to be, to have, and prepositions of, with, etc. strongly depends on the linguistic context. Which museum has the most paintings? ?museum dbo:exhibits ?painting . Which country has the most caves? ?cave dbo:location ?country . 40
  • 40.  The number of non-English actors on the web is growing substantially. ◦ Accessing data. ◦ Creating and publishing data. Semantic Web:  In principle very well suited for multilinguality, as URIs are language-independent.  But adding multilingual labels is not common practice (less than a quarter of the RDF literals have language tags, and most of those tags are in English). 41
  • 41.  Requirement: Completeness and accuracy (Wrong answers are worse than no answers) In the context of the Semantic Web:  QA systems need to deal with heterogeneous and imperfect data. ◦ Datasets are often incomplete. ◦ Different datasets sometimes contain duplicate information, often using different vocabularies even when talking about the same things. ◦ Datasets can also contain conflicting information and inconsistencies. 42
  • 42.  Data is distributed among a large collection of interconnected datasets. Example: What are side effects of drugs used for the treatment of Tuberculosis? SELECT DISTINCT ?x WHERE { disease:1154 diseasome:possibleDrug ?d1. ?d1 a drugbank:drugs . ?d1 owl:sameAs ?d2. ?d2 sider:sideEffect ?x. } 43
  • 43.  Requirement: Real-time answers, i.e. low processing time. In the context of the Semantic Web:  Datasets are huge. ◦ There are a lot of distributed datasets that might be relevant for answering the question. ◦ Reported performance of current QA systems amounts to ~20-30 seconds per question (on one dataset). 44
  • 44.  Bridge the gap between natural languages and data.  Deal with incomplete, noisy and heterogeneous datasets.  Scale to a large number of huge datasets.  Use distributed and interlinked datasets.  Integrate structured and unstructured data.  Low maintainability costs (easily adaptable to new datasets and domains). 45
  • 45. The Anatomy of a QA System 46
  • 46.  Natural Language Interfaces (NLI) ◦ Input: Natural language queries ◦ Output: Either processed or unprocessed answers  Processed: Direct answers (QA).  Unprocessed: Database records, text snippets, documents. 47 NLI QA
  • 47.  Categorization of questions and answers.  Important for: ◦ Understanding the challenges before attacking the problem. ◦ Scoping the QA system.  Based on: ◦ Chin-Yew Lin: Question Answering. ◦ Farah Benamara: Question Answering Systems: State of the Art and Future Directions. 48
  • 48.  The part of the question that says what is being asked: ◦ Wh-words:  who, what, which, when, where, why, and how ◦ Wh-words + nouns, adjectives or adverbs:  “which party …”, “which actress …”, “how long …”, “how tall …”. 49
  • 49.  Useful for distinguishing different processing strategies ◦ FACTOID:  PREDICATIVE QUESTIONS:  “Who was the first man in space?”  “What is the highest mountain in Korea?”  “How far is Earth from Mars?”  “When did the Jurassic Period end?”  “Where is Taj Mahal?”  LIST:  “Give me all cities in Germany.”  SUPERLATIVE:  “What is the highest mountain?”  YES-NO:  “Was Margaret Thatcher a chemist?” 50
  • 50.  Useful for distinguishing different processing strategies ◦ OPINION:  “What do most Americans think of gun control?” ◦ CAUSE & EFFECT:  “What is the most frequent cause for lung cancer?” ◦ PROCESS:  “How do I make a cheese cake?” ◦ EXPLANATION & JUSTIFICATION:  “Why did the revenue of IBM drop?” ◦ ASSOCIATION QUESTION:  “What is the connection between Barack Obama and Indonesia?” ◦ EVALUATIVE OR COMPARATIVE QUESTIONS:  “What is the difference between impressionism and expressionism?” 51
  • 51. The class of object sought by the question:  Abbreviation  Entity: event, color, animal, plant,. . .  Description, Explanation & Justification : definition, manner, reason,. . . (“How, why …”)  Human: group, individual,. . . (“Who …”)  Location: city, country, mountain,. . . ( “Where …”)  Numeric: count, distance, size,. . . (“How many how far, how long …”)  Temporal: date, time, …(from “When …”) 52
  • 52.  Question focus is the property or entity that is being sought by the question ◦ “In which city was Barack Obama born?” ◦ “What is the population of Galway?”  Question topic: What the question is generally about : ◦ “What is the height of Mount Everest?”  (geography, mountains) ◦ “Which organ is affected by the Meniere’s disease?”  (medicine) 53
  • 53.  Structure level: ◦ Structured data (databases). ◦ Semi-structured data (e.g. XML). ◦ Unstructured data.  Data source: ◦ Single dataset (centralized). ◦ Enumerated list of multiple, distributed datasets. ◦ Web-scale. 54
  • 54.  Domain Scope: ◦ Open Domain ◦ Domain specific  Data Type: ◦ Text ◦ Image ◦ Sound ◦ Video  Multi-modal QA 55
  • 55.  Long answers ◦ Definition/justification based.  Short answers ◦ Phrases.  Exact answers ◦ Named entities, numbers, aggregate, yes/no. 56
  • 56.  Relevance: The level in which the answer addresses users information needs.  Correctness: The level in which the answer is factually correct.  Conciseness: The answer should not contain irrelevant information.  Completeness: The answer should be complete.  Simplicity: The answer should be easy to interpret.  Justification: Sufficient context should be provided to support the data consumer in the determination of the query correctness. 57
  • 57.  Right: The answer is correct and complete.  Inexact: The answer is incomplete or incorrect.  Unsupported: The answer does not have an appropriate evidence/justification.  Wrong: The answer is not appropriate for the question. 58
  • 58.  Simple Extraction: Direct extraction of snippets from the original document(s) / data records.  Combination: Combines excerpts from multiple sentences, documents / multiple data records, databases.  Summarization: Synthesis from large texts / data collections.  Operational/functional: Depends on the application of functional operators.  Reasoning: Depends on the application of an inference process over the original data. 59
  • 59.  Semantic Tractability (Popescu et al., 2003): Vocabulary overlap/distance between the query and the answer.  Answer Locality (Webber et al., 2003): Whether answer fragments are distributed across different document fragments / documents or datasets/dataset records.  Derivability (Webber et al, 2003): Dependent if the answer is explicit or implicit. Level of reasoning dependency.  Semantic Complexity: Level of ambiguity and discourse/data heterogeneity. 60
  • 60. 61
  • 61.  Data pre-processing: Pre-processes the database data (includes indexing, data cleaning, feature extraction).  Question Analysis: Performs syntactic analysis and detects/extracts the core features of the question (NER, answer type, etc).  Data Matching: Matches terms in the question to entities in the data.  Query Constuction: Generates structured query candidates considering the question-data mappings and the syntactic constraints in the query and in the database.  Scoring: Data matching and the query construction components output several candidates that need to be scored and ranked according to certain criteria.  Answer Retrieval & Extraction: Executes the query and extracts the natural language answer from the result set. 62
  • 62. 64 QA over Linked Data (Case Studies)
  • 63.  ORAKEL & Pythia (Cimiano et al, 2007; Unger & Cimiano, 2011) ◦ Ontology-specific question answering  TBSL (Unger et al., 2012) ◦ Template-based question answering  Aqualog & PowerAqua (Lopez et al. 2006) ◦ Querying on a Semantic Web scale  Treo (Freitas et al. 2011, 2014) ◦ Schema-agnostic querying using distributional semantics  IBM Watson (Ferrucci et al., 2010) ◦ Large-scale evidence-based model for QA 65
  • 64.  QuestIO & Freya (Damljanovic et al. 2010)  QAKIS (Cabrio et al. 2012)  Yahya et al., 2013 66
  • 65. 67 ORAKEL (Cimiano et al, 2007) & Pythia (Unger & Cimiano, 2011) Ontology-specific question answering
  • 66.  Key contributions: ◦ Using ontologies to interpret user questions ◦ Relies on a deep linguistic analysis that returns semantic representations aligned to the ontology vocabulary and structure  Evaluation: Geobase 68
  • 67. 69  Ontology-based QA: ◦ Ontologies play a central role in interpreting user questions ◦ Output is a meaning representation that is aligned to the ontology underlying the dataset that is queried ◦ ontological knowledge is used for drawing inferences, e.g. for resolving ambiguities  Grammar-based QA: ◦ Rely on linguistic grammars that assign a syntactic and semantic representation to lexical units ◦ Advantage: can deal with questions of arbitrary complexity ◦ Drawback: brittleness (fail if question cannot be parsed because expressions or constructs are not covered by the grammar)
  • 68. 70  Ontology-independent entries ◦ mostly function words  quantifiers (some, every, two)  wh-words (who, when, where, which, how many)  negation (not) ◦ manually specified and re-usable for all domains  Ontology-specific entries ◦ content words and phrases corresponding to concepts and properties in the ontology ◦ automatically generated from an ontology lexicon
  • 69.  Aim: capture rich and structured linguistic information about how ontology elements are lexicalized in a particular language lemon (Lexicon Model for Ontologies) http://lemon-model.net ◦ meta-model for describing ontology lexica with RDF ◦ declarative (abstracting from specific syntactic and semantic theories) ◦ separation of lexicon and ontology 71
  • 70.  Semantics by reference: ◦ The meaning of lexical entries is specified by pointing to elements in the ontology.  Example: 72
  • 71. 73
  • 72.  Which cities have more than three universities? SELECT DISTINCT ?x WHERE { ?x rdf:type dbo:City . ?y rdf:type dbo:University . ?y dbo:city ?x . } GROUP BY ?y HAVING (COUNT(?y) > 3) 74
  • 73. 75 TBSL (Unger et al., 2012) Template-based question answering
  • 74.  Key contributions: ◦ Constructs a query template that directly mirrors the linguistic structure of the question ◦ Instantiates the template by matching natural language expressions with ontology concepts  Evaluation: QALD 2012 76
  • 75.  In order to understand a user question, we need to understand: The words (dataset-specific) Abraham Lincoln → res:Abraham Lincoln died in → dbo:deathPlace The semantic structure (dataset-independent) who → SELECT ?x WHERE { … } the most N → ORDER BY DESC(COUNT(?N)) LIMIT 1 more than i N → HAVING COUNT(?N) > i 77
  • 76.  Goal: An approach that combines both an analysis of the semantic structure and a mapping of words to URIs.  Two-step approach: ◦ 1. Template generation  Parse question to produce a SPARQL template that directly mirrors the structure of the question, including filters and aggregation operations. ◦ 2. Template instantiation  Instantiate SPARQL template by matching natural language expressions with ontology concepts using statistical entity identification and predicate detection. 78
  • 77. SPARQL template: SELECT DISTINCT ?x WHERE { ?y rdf:type ?c . ?y ?p ?x . } ORDER BY DESC(COUNT(?y)) OFFSET 0 LIMIT 1 ?c CLASS [films] ?p PROPERTY [produced] Instantiations: ?c = <http://dbpedia.org/ontology/Film> ?p = <http://dbpedia.org/ontology/producer> 79
  • 78. 80
  • 79. 81
  • 80.  1. Natural language question is tagged with part-of-speech information.  2. Based on POS tags, grammar entries are built on the fly. ◦ Grammar entries are pairs of:  tree structures (Lexicalized Tree Adjoining Grammar)  semantic representations (ext. Discourse Representation Structures)  3. These lexical entries, together with domain-independent lexical entries, are used for parsing the question (cf. Pythia).  4. The resulting semantic representation is translated into a SPARQL template. 82
  • 81.  Domain-independent: who, the most  Domain-dependent: produced/VBD, films/NNS SPARQL template 1: SELECT DISTINCT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [films] ?p PROPERTY [produced] SPARQL template 2: SELECT DISTINCT ?x WHERE { ?x ?p ?y . } ORDER BY DESC(COUNT(?y)) LIMIT 1 ?p PROPERTY [films] 83
  • 82. 84
  • 83.  1. For resources and classes, a generic approach to entity detection is applied: ◦ Identify synonyms of the label using WordNet. ◦ Retrieve entities with a label similar to the slot label based on string similarities (trigram, Levenshtein and substring similarity).  2. For property labels, the label is additionally compared to natural language expressions stored in the BOA pattern library.  3. The highest ranking entities are returned as candidates for filling the query slots. 85
  • 84. ?c CLASS [films] <http://dbpedia.org/ontology/Film> <http://dbpedia.org/ontology/FilmFestival> ... ?p PROPERTY [produced] <http://dbpedia.org/ontology/producer> <http://dbpedia.org/property/producer> <http:// dbpedia.org/ontology/wineProduced> 86
  • 85. 87
  • 86.  1. Every entity receives a score considering string similarity and prominence.  2. The score of a query is then computed as the average of the scores of the entities used to fill its slots.  3. In addition, type checks are performed: ◦ For all triples ?x rdf:type <class>, all query triples ?x p e and e p ?x are checked w.r.t. whether domain/range of p is consistent with <class>.  4. Of the remaining queries, the one with highest score that returns a result is chosen. 88
  • 87. SELECT DISTINCT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/Film> . } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.76 SELECT DISTINCT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/FilmFestival>. } ORDER BY DESC(COUNT(?y)) LIMIT 1 Score: 0.60 89
  • 88.  The created template structure does not always coincide with how the data is actually modelled.  Considering all possibilities of how the data could be modelled leads to a big amount of templates (and even more queries) for one question. 90
  • 89. 91 Aqualog & PowerAqua (Lopez et al. 2006) Querying on a Semantic Web scale
  • 90.  Key contributions: ◦ Pioneer work on the QA over Semantic Web data. ◦ Semantic similarity mapping.  Terminological Matching: ◦ WordNet-based ◦ Ontology Based ◦ String similarity ◦ Sense-based similarity matcher  Evaluation: QALD (2011).  Extends the AquaLog system. 92
  • 91. 93
  • 92.  Two words are strongly similar if any of the following holds: ◦ 1. They have a synset in common (e.g. “human” and “person”) ◦ 2. A word is a hypernym/hyponym in the taxonomy of the other word. ◦ 3. If there exists an allowable “is-a” path connecting a synset associated with each word. ◦ 4. If any of the previous cases is true and the definition (gloss) of one of the synsets of the word (or its direct hypernyms/hyponyms) includes the other word as one of its synonyms, we said that they are highly similar. 94 Lopez et al. 2006
  • 93. 95
  • 94. 96 Treo (Freitas et al. 2011, 2014) Schema-agnostic querying using distributional semantics
  • 95.  Key contributions: ◦ Distributional semantic relatedness matching model. ◦ Compositional-distributional model for QA.  Terminological Matching: ◦ Explicit Semantic Analysis (ESA) ◦ String similarity + node cardinality  Evaluation: QALD (2011) 97
  • 97. 99
  • 98. 10 0
  • 99. 10 1
  • 100. 10 2
  • 101. • Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions. • If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements. Sahlgren, 2013 Formal World Real World 10 3 Baroni et al. 2013
  • 102. “Words occurring in similar (linguistic) contexts are semantically related.”  If we can equate meaning with context, we can simply record the contexts in which a word occurs in a collection of texts (a corpus).  This can then be used as a surrogate of its semantic representation. 104
  • 103. c1 child husband spouse cn c2 function (number of times that the words occur in c1) 0.7 0.5 Commonsense is here 105
  • 104. θ c1 child husband spouse cn c2 Works as a semantic ranking function 106
  • 105. Query Planner Ƭ-Space Large-scale unstructured data Commonsense knowledge Database Distributional semantics Core semantic approximation & composition operations Query AnalysisQuery Query Features Query Plan 107
  • 106. Query Planner Ƭ-Space Wikipedia Commonsense knowledge RDF Explicit Semantic Analysis Core semantic approximation & composition operations Query AnalysisQuery Query Features Query Plan 108
  • 108. The vector space is segmented by the instances 110
  • 111.  Instance search ◦ Proper nouns ◦ String similarity + node cardinality  Class (unary predicate) search ◦ Nouns, adjectives and adverbs ◦ String similarity + Distributional semantic relatedness  Property (binary predicate) search ◦ Nouns, adjectives, verbs and adverbs ◦ Distributional semantic relatedness  Navigation  Extensional expansion ◦ Expands the instances associated with a class.  Operator application ◦ Aggregations, conditionals, ordering, position  Disjunction & Conjunction  Disambiguation dialog (instance, predicate) 113
  • 112.  Minimize the impact of Ambiguity, Vagueness, Synonymy.  Address the simplest matchings first (heuristics).  Semantic Relatedness as a primitive operation.  Distributional semantics as commonsense knowledge. 114
  • 113.  Transform natural language queries into triple patterns. “Who is the daughter of Bill Clinton married to?” 115
  • 114.  Step 1: POS Tagging ◦ Who/WP ◦ is/VBZ ◦ the/DT ◦ daughter/NN ◦ of/IN ◦ Bill/NNP ◦ Clinton/NNP ◦ married/VBN ◦ to/TO ◦ ?/. 116
  • 115.  Step 2: Core Entity Recognition ◦ Rules-based: POS Tag + TF/IDF Who is the daughter of Bill Clinton married to? (PROBABLY AN INSTANCE) 117
  • 116.  Step 3: Determine answer type ◦ Rules-based. Who is the daughter of Bill Clinton married to? (PERSON) 118
  • 117.  Step 4: Dependency parsing ◦ dep(married-8, Who-1) ◦ auxpass(married-8, is-2) ◦ det(daughter-4, the-3) ◦ nsubjpass(married-8, daughter-4) ◦ prep(daughter-4, of-5) ◦ nn(Clinton-7, Bill-6) ◦ pobj(of-5, Clinton-7) ◦ root(ROOT-0, married-8) ◦ xcomp(married-8, to-9) 119
  • 118.  Step 5: Determine Partial Ordered Dependency Structure (PODS) ◦ Rules based.  Remove stop words.  Merge words into entities.  Reorder structure from core entity position. Bill Clinton daughter married to (INSTANCE) Person ANSWER TYPE QUESTION FOCUS 120
  • 119.  Step 5: Determine Partial Ordered Dependency Structure (PODS) ◦ Rules based.  Remove stop words.  Merge words into entities.  Reorder structure from core entity position. Bill Clinton daughter married to (INSTANCE) Person (PREDICATE) (PREDICATE) Query Features 121
  • 120.  Map query features into a query plan.  A query plan contains a sequence of: ◦ Search operations. ◦ Navigation operations. (INSTANCE) (PREDICATE) (PREDICATE) Query Features  (1) INSTANCE SEARCH (Bill Clinton)  (2) DISAMBIGUATE ENTITY TYPE  (3) GENERATE ENTITY FACETS  (4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)  (5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)  (6) p2 <- SEARCH RELATED PREDICATE (e1, married to)  (7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)  (8) POST PROCESS (Bill Clintion, e1, p1, e2, p2) Query Plan 122
  • 121. Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: Entity Search 123
  • 122. Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists:religion :Yale_Law_School :almaMater ... (PIVOT ENTITY) (ASSOCIATED TRIPLES) 124
  • 123. Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Baptists:religion :Yale_Law_School :almaMater ... sem_rel(daughter,child)=0.054 sem_rel(daughter,child)=0.004 sem_rel(daughter,alma mater)=0.001 Which properties are semantically related to ‘daughter’? 125
  • 124. Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child 126
  • 125.  Computation of a measure of “semantic proximity” between two terms.  Allows a semantic approximate matching between query terms and dataset terms.  It supports a commonsense reasoning-like behavior based on the knowledge embedded in the corpus. 127
  • 126.  Use distributional semantics to semantically match query terms to predicates and classes.  Distributional principle: Words that co-occur together tend to have related meaning. ◦ Allows the creation of a comprehensive semantic model from unstructured text. ◦ Based on statistical patterns over large amounts of text. ◦ No human annotations.  Distributional semantics can be used to compute a semantic relatedness measure between two words. 128
  • 127. Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child (PIVOT ENTITY) 129
  • 128. Bill Clinton daughter married to Person :Bill_Clinton Query: Linked Data: :Chelsea_Clinton :child :Mark_Mezvinsky :spouse 130
  • 129. 133
  • 130. 134
  • 131. 135
  • 132. 136
  • 133. What is the highest mountain? (CLASS) (OPERATOR) Query Features mountain - highest PODS 137
  • 139. 143
  • 140.  Semantic approximation in databases (as in any IR system): semantic best-effort.  Need some level of user disambiguation, refinement and feedback.  As we move in the direction of semantic systems we should expect the need for principled dialog mechanisms (like in human communication).  Pull the the user interaction back into the system. 144
  • 141. 145
  • 142. 146
  • 143. 147 IBM Watson (Ferrucci et al., 2010) Large-scale evidence-based model for QA
  • 144.  Key contributions: ◦ Evidence-based QA system. ◦ Complex and high performance QA Pipeline. ◦ The system uses more than 50 scoring components that produce scores which range from probabilities and counts to categorical features. ◦ Major cultural impact:  Before: QA as AI vision, academic exercise.  After: QA as an attainable software architecture in the short term.  Evaluation: Jeopardy! Challenge 148
  • 145. Ferrucci et al. 2010 149
  • 146.  Question analysis: includes shallow and deep parsing, extraction of logical forms, semantic role labelling, coreference resolution, relations extraction, named entity recognition, among others.  Question decomposition: decomposition of the question into separate phrases, which will generate constraints that need to be satisfied by evidence from the data. 150
  • 147. Ferrucci et al. 2010 Question Question & Topic Analysis Question Decomposition 151
  • 148.  Hypothesis generation: ◦ Primary search  Document and passage retrieval  SPARQL queries are used over triple stores. ◦ Candidate answer generation (maximizing recall).  Information extraction techniques are applied to the search results to generate candidate answers.  Soft filtering: Application of lightweight (less resource intensive) scoring algorithms to a larger set of initial candidates to prune the list of candidates before the more intensive scoring components. 152
  • 149. Ferrucci et al. 2010 Question Primary Search Candidate Answer Generation Hypothesis Generation Answer Sources Question & Topic Analysis Question Decomposition Hypothesis Generation Hypothesis and Evidence Scoring 153
  • 150.  Hypothesis and evidence scoring: ◦ Supporting evidence retrieval  Seeks additional evidence for each candidate answer from the data sources while the deep evidence scoring step determines the degree of certainty that the retrieved evidence supports the candidate answers. ◦ Deep evidence scoring  Scores are then combined into an overall evidence profile which groups individual features into aggregate evidence dimensions. 154
  • 151. Ferrucci et al. 2010 Answer Scoring Question Evidence Sources Primary Search Candidate Answer Generation Hypothesis Generation Hypothesis and Evidence Scoring Answer Sources Question & Topic Analysis Question Decomposition Evidence Retrieval Deep Evidence Scoring Hypothesis Generation Hypothesis and Evidence Scoring 155
  • 152.  Answer merging: is a step that merges answer candidates (hypotheses) with different surface forms but with related content, combining their scores.  Ranking and confidence estimation: ranks the hypotheses and estimate their confidence based on the scores, using machine learning approaches over a training set. Multiple trained models cover different question types. 156
  • 153. Ferrucci et al. 2010 Farrell, 2011 Answer Scoring Models Answer & Confidence Question Evidence Sources Models Models Models Models ModelsPrimary Search Candidate Answer Generation Hypothesis Generation Hypothesis and Evidence Scoring Final Confidence Merging & Ranking Synthesis Answer Sources Question & Topic Analysis Question Decomposition Evidence Retrieval Deep Evidence Scoring Hypothesis Generation Hypothesis and Evidence Scoring Learned Models help combine and weigh the Evidence 157 www.medbiq.org/sites/default/files/presentations/2011/Farrell.pp
  • 154. Question 100s Possible Answers 1000’s of Pieces of Evidence Multiple Interpretations 100,000’s scores from many simultaneous Text Analysis Algorithms100s sources Hypothesis Generation Hypothesis and Evidence Scoring Final Confidence Merging & Ranking Synthesis Question & Topic Analysis Question Decomposition Hypothesis Generation Hypothesis and Evidence Scoring Answer & Confidence Ferrucci et al. 2010, Farrell, 2011 158 www.medbiq.org/sites/default/files/presentations/2011/Farrell.pp UIMA for interoperability UIMA-AS for scale-out and speed
  • 155. Ferrucci et al. 2010 159
  • 156. 160 Evaluation of QA over Linked Data
  • 157.  Test Collection ◦ Questions ◦ Datasets ◦ Answers (Gold-standard)  Evaluation Measures 161
  • 158.  Measures how complete is the answer set.  The fraction of relevant instances that are retrieved.  Which are the Jovian planets in the Solar System? ◦ Returned Answers:  Mercury  Jupiter  Saturn  Gold-standard: – Jupiter – Saturn – Neptune – Uranus Recall = 2/4 = 0.5 162
  • 159.  Measures how accurate is the answer set.  The fraction of retrieved instances that are relevant.  Which are the Jovian planets in the Solar System? ◦ Returned Answers:  Mercury  Jupiter  Saturn  Gold-standard: – Jupiter – Saturn – Neptune – Uranus Recall = 2/3 = 0.67 163
  • 160.  Measures the ranking quality.  The Reciprocal-Rank (1/r) of a query can be defined as the rank r at which a system returns the first relevant result.  Which are the Jovian planets in the Solar System?  Returned Answers: – Mercury – Jupiter – Saturn  Gold-standard: – Jupiter – Saturn – Neptune – Uranus rr = 1/2 = 0.5 164
  • 161.  Query execution time  Indexing time  Index size  Dataset adaptation effort (Indexing time)  Semantic enrichment/disambiguation ◦ # of operations/time
  • 162.  Question Answering over Linked Data (QALD-CLEF)  INEX Linked Data Track  BioASQ  SemSearch 168
  • 163. 169
  • 164.  QALD is a series of evaluation campaigns on question answering over linked data. ◦ QALD-1 (ESWC 2011) ◦ QALD-2 as part of the workshop  (Interacting with Linked Data (ESWC 2012)) ◦ QALD-3 (CLEF 2013) ◦ QALD-4 (CLEF 2014)  It is aimed at all kinds of systems that mediate between a user, expressing his or her information need in natural language, and semantic data. 170
  • 165.  QALD-4 is part of the Question Answering track at CLEF 2014: http://nlp.uned.es/clef-qa/  Tasks: ◦ 1. Multilingual question answering over DBpedia ◦ 2. Biomedical question answering on interlinked data ◦ 3. Hybrid question answering 171
  • 166. Task:  Given a natural language question or keywords, either retrieve the correct answer(s) from a given RDF repository, or provide a SPARQL query that retrieves these answer(s). ◦ Dataset: DBpedia 3.9 (with multilingual labels) ◦ Questions: 200 training + 50 test ◦ Seven languages:  English, Spanish, German, Italian, French, Dutch, Romanian 172
  • 167. <question id = "36" answertype = "resource" aggregation = "false" onlydbo = "true" > Through which countries does the Yenisei river flow? Durch welche Länder fließt der Yenisei? ¿Por qué países fluye el río Yenisei? ... PREFIX res: <http://dbpedia.org/resource/> PREFIX dbo: <http://dbpedia.org/ontology/> SELECT DISTINCT ?uri WHERE { res:Yenisei_River dbo:country ?uri . } 173
  • 168.  Datasets: SIDER, Diseasome, Drugbank  Questions: 25 training + 25 test  require integration of information from different datasets  Example: What is the side effects of drugs used for Tuberculosis? SELECT DISTINCT ?x WHERE { disease:1154 diseasome:possibleDrug ?v2 . ?v2 a drugbank:drugs . ?v3 owl:sameAs ?v2 . ?v3 sider:sideEffect ?x . } 174
  • 169.  Dataset: DBpedia 3.9 (with English abstracts)  Questions: 25 training + 10 test  require both structured data and free text from the abstract to be answered  Example: Give me the currencies of all G8 countries. SELECT DISTINCT ?uri WHERE { ?x text:"member of" text:"G8" . ?x dbo:currency ?uri . } 175
  • 170.  Focuses on the combination of textual and structured data.  Datasets: ◦ English Wikipedia (MediaWiki XML Format) ◦ DBpedia 3.8 & YAGO2 (RDF) ◦ Links among the Wikipedia, DBpedia 3.8, and YAGO2 URI's.  Tasks: ◦ Ad-hoc Task: return a ranked list of results in response to a search topic that is formulated as a keyword query (144 search topics). ◦ Jeopardy Task: Investigate retrieval techniques over a set of natural- language Jeopardy clues (105 search topics – 74 (2012) + 31 (2013)). https://inex.mmci.uni-saarland.de/tracks/lod/ 181
  • 171. 182
  • 172.  Focuses on entity search over Linked Datasets.  Datasets: ◦ Sample of Linked Data crawled from publicly available sources (based on the Billion Triple Challenge 2009).  Tasks: ◦ Entity Search: Queries that refer to one particular entity. Tiny sample of Yahoo! Search Query. ◦ List Search: The goal of this track is select objects that match particular criteria. These queries have been hand-written by the organizing committee. http://semsearch.yahoo.com/datasets.php# 183
  • 173.  List Search queries: ◦ republics of the former Yugoslavia ◦ ten ancient Greek city ◦ kingdoms of Cyprus ◦ the four of the companions of the prophet ◦ Japanese-born players who have played in MLB where the British monarch is also head of state ◦ nations where Portuguese is an official language ◦ bishops who sat in the House of Lords ◦ Apollo astronauts who walked on the Moon 184
  • 174.  Entity Search queries: ◦ 1978 cj5 jeep ◦ employment agencies w. 14th street ◦ nyc zip code ◦ waterville Maine ◦ LOS ANGELES CALIFORNIA ◦ ibm ◦ KARL BENZ ◦ MIT 185
  • 175. Balog & Neumayer, A Test Collection for Entity Search in DBpedia (2013). 186
  • 176.  Datasets: ◦ PubMed documents  Tasks: ◦ 1a: Large-Scale Online Biomedical Semantic Indexing  Automatic annotation of PubMed documents.  Training data is provided. ◦ 1b: Introductory Biomedical Semantic QA  300 questions and related material (concepts, triples and golden answers). 187
  • 177.  Metrics, Statistics, Tests - Tetsuya Sakai (IR) ◦ http://www.promise-noe.eu/documents/10156/26e7f254- 1feb-4169-9204-1c53cc1fd2d7  Building test Collections (IR Evaluation - Ian Soboroff) ◦ http://www.promise-noe.eu/documents/10156/951b6dfb- a404-46ce-b3bd-4bbe6b290bfd 188
  • 179.  DBpedia ◦ http://dbpedia.org/  YAGO ◦ http://www.mpi-inf.mpg.de/yago-naga/yago/  Freebase ◦ http://www.freebase.com/  Wikipedia dumps ◦ http://dumps.wikimedia.org/  ConceptNet ◦ http:// conceptnet5.media.mit.edu/  Common Crawl ◦ http://commoncrawl.org/  Where to use: ◦ As a commonsense KB or as a data source 190
  • 180.  High domain coverage: ◦ ~95% of Jeopardy! Answers. ◦ ~98% of TREC answers.  Wikipedia is entity-centric.  Curated link structure.  Complementary tools: ◦ Wikipedia Miner  Where to use: ◦ Construction of distributional semantic models. ◦ As a commonsense KB 191
  • 181.  WordNet ◦ http://wordnet.princeton.edu/  Wiktionary ◦ http://www.wiktionary.org/ ◦ API: https://www.mediawiki.org/wiki/API:Main_page  FrameNet ◦ https://framenet.icsi.berkeley.edu/fndrupal/  VerbNet ◦ http://verbs.colorado.edu/~mpalmer/projects/verbnet.html  English lexicon for DBpedia 3.8 (in the lemon format) ◦ http://lemon-model.net/lexica/dbpedia_en/  PATTY (collection of semantically-typed relational patterns) ◦ http://www.mpi-inf.mpg.de/yago-naga/patty/  BabelNet ◦ http://babelnet.org/  Where to use: ◦ Query expansion ◦ Semantic similarity ◦ Semantic relatedness ◦ Word sense disambiguation 192
  • 182.  Lucene & Solr ◦ http://lucene.apache.org/  Terrier ◦ http://terrier.org/  Where to use: ◦ Answer Retrieval ◦ Scoring ◦ Query-Data matching 193
  • 183.  GATE (General Architecture for Text Engineering) ◦ http://gate.ac.uk/  NLTK (Natural Language Toolkit) ◦ http://nltk.org/  Stanford NLP ◦ http://www-nlp.stanford.edu/software/index.shtml  LingPipe ◦ http://alias-i.com/lingpipe/index.html  Where to use: ◦ Question Analysis 194
  • 184.  MALT ◦ http://www.maltparser.org/ ◦ Languages (pre-trained): English, French, Swedish  Stanford parser ◦ http://nlp.stanford.edu/software/lex-parser.shtml ◦ Languages: English, German, Chinese, and others  CHAOS ◦ http://art.uniroma2.it/external/chaosproject/ ◦ Languages: English, Italian  C&C Parser ◦ http://svn.ask.it.usyd.edu.au/trac/candc  Where to Use: ◦ Question Analysis 195
  • 185.  NERD (Named Entity Recognition and Disambiguation) ◦ http://nerd.eurecom.fr/  Stanford Named Entity Recognizer ◦ http://nlp.stanford.edu/software/CRF-NER.shtml  FOX (Federated Knowledge Extraction Framework) ◦ http://fox.aksw.org  DBpedia Spotlight ◦ http://spotlight.dbpedia.org  Where to use: ◦ Question Analysis ◦ Query-Data Matching 196
  • 186.  Wikipedia Miner ◦ http://wikipedia-miner.cms.waikato.ac.nz/  WS4J (Java API for several semantic relatedness algorithms) ◦ https://code.google.com/p/ws4j/  SecondString (string matching) ◦ http://secondstring.sourceforge.net  EasyESA (distributional semantic relatedness) ◦ http://treo.deri.ie/easyESA  Sspace (distributional semantics framework) ◦ https://github.com/fozziethebeat/S-Space  Where to use: ◦ Query-Data matching ◦ Semantic relatedness & similiarity ◦ Word Sense Disambiguation 197
  • 187.  DIRT ◦ Paraphrase Collection:  http://aclweb.org/aclwiki/index.php?title ◦ DIRT_Paraphrase_Collection  Demo:  http://demo.patrickpantel.com/demos/lexsem/paraphrase.htm  PPDB (The Paraphrase Database) ◦ http://www.cis.upenn.edu/~ccb/ppdb/  Where to use: ◦ Query-Data matching 198
  • 188.  Apache UIMA ◦ http://uima.apache.org/  Open Advancement of Question Answering Systems (OAQA) ◦ http://oaqa.github.io/  OKBQA ◦ http://www.okbqa.org/documentation https://github.com/okbqa  Where to use: ◦ Components integration 199
  • 190.  Querying distributed linked data  Integration of structured and unstructured data  User interaction and context mechanisms  Integration of reasoning (deductive, inductive, counterfactual, abductive ...) on QA approaches and test collections  Measuring confidence and answer uncertainty  Multilinguality  Machine Learning  Reproducibility in QA research 201
  • 191.  Big/Linked Data demand new principled semantic approaches to cope with the scale and heterogeneity of data.  Part of the Semantic Web/AI vision can be addressed today with a multi-disciplinary perspective: ◦ Linked Data, IR and NLP  The multidiscipinarity of the QA problem can serve show what semantic computing have achieved and can be transported to other information system types.  Challenges are moving from the construction of basic QA systems to more sophisticated semantic functionalities.  Very active research area. 202
  • 192.  Unger, C., Freitas, A., Cimiano, P., Introduction to Question Answering for Linked Data, Reasoning Web Summer School, 2014. 203
  • 193. [1] Eifrem, A NOSQL Overview And The Benefits Of Graph Database, 2009 [2] Idehen, Understanding Linked Data via EAV Model, 2010. [3] Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users?, 2007 [4] Chin-Yew Lin, Question Answering. [5] Farah Benamara, Question Answering Systems: State of the Art and Future Directions. [6] Yahya et al Robust Question Answering over the Web of Linked Data, CIKM, 2013. [7] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012. [8] Freitas et al., Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach,, 2014. [9] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012. [10] Freitas & Curry, Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, IUI, 2014. [11] Cimiano et al., Towards portable natural language interfaces to knowledge bases, 2008. [12] Lopez et al., PowerAqua: fishing the semantic web, 2006. [13] Damljanovic et al., Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-based Lookup through the User Interaction, 2010 [14] Unger et al. Template-based Question Answering over RDF Data, 2012. [15] Cabrio et al., QAKiS: an Open Domain QA System based on Relational Patterns, 2012. [16] How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users?, 2007. [17] Popescu et al.,Towards a theory of natural language interfaces to databases., 2003. [18] Farrel, IBM Watson A Brief Overview and Thoughts for Healthcare Education and Performance Improvement 204

Notas del editor

  1. Emphasize entity
  2. Emphasize entity
  3. ADD BABELNET
  4. Undo the link
  5. Emphasize entity
  6. Emphasize entity
  7. Emphasize entity
  8. Emphasize entity
  9. Emphasize entity
  10. Emphasize entity
  11. Emphasize entity