Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
A Modified Information Retrieval Approach to Produce Candidates for Question Answering
1. A Modified Information Retrieval
Approach to Produce Answer
Candidates for Question
Answering
Johannes Leveling
Intelligent Information and Communication Systems (IICS)
University of Hagen (FernUniversität in Hagen)
58084 Hagen, Germany
johannes.leveling@fernuni-hagen.de
LWA 2007 Workshop, Halle (Saale), Germany
2. A modified
information
retrieval approach
to produce answer
candidates for QA Outline
Johannes
Leveling
1 IRSAW
IRSAW
QA phases
2 QA phases
MIRA
Embedding of MIRA
Expected answer types 3 MIRA
TüBa-D/Z annotation
MAVE
Embedding of MIRA
Evaluation Expected answer types
Summary and TüBa-D/Z annotation
Future Work
References
4 MAVE
5 Evaluation
6 Summary and Future Work
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 2 / 18
3. A modified
information
retrieval approach
to produce answer
candidates for QA IRSAW question
Johannes
Leveling answering framework
IRSAW
QA phases IRSAW framework Local
Database
MIRA
Embedding of MIRA
Documents
Expected answer types
Document
TüBa-D/Z annotation Answer candidate
preprocessing
producer: InSicht
MAVE
Evaluation Answer candidate Answer validation
producer: QAP and selection: MAVE
Summary and Natural language question
Future Work Answer candidate Answer
Question
producer: MIRA
References processing
Produce answer candidates
IRSAW: Intelligent Information Retrieval on the Basis of a
Semantically Annotated Web
funded by the DFG (Deutsche Forschungsgemeinschaft)
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 3 / 18
4. A modified
information
retrieval approach
to produce answer
candidates for QA Question answering
Johannes
Leveling phases
IRSAW
QA phases
MIRA
Embedding of MIRA
1 Process document collection
Expected answer types
TüBa-D/Z annotation 2 Preprocess question
MAVE (⇐ Natural language question)
Evaluation
Summary and
3 Retrieve text segments
Future Work
4 Match document and question representations
References
5 Return answer candidates
6 Merge and validate answer candidates
(⇒ Answer)
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 4 / 18
5. A modified
information
retrieval approach
to produce answer
candidates for QA Embedding of MIRA in
Johannes
Leveling IRSAW
IRSAW
QA phases
• Employ different modules to produce data
MIRA
streams containing answer candidates:
Embedding of MIRA
Expected answer types
• InSicht (Matching semantic network
TüBa-D/Z annotation
representations, Hartrumpf and Leveling (2007))
MAVE
• QAP (Question Answering by Pattern matching,
Evaluation
Summary and
Leveling (2006)), and
Future Work • MIRA (Modified Information Retrieval Approach)
References
• Use different methods to produce answer
streams to increase recall and robustness
• Merge, rank, logically validate answer
candidates and select best answer, (MAVE,
Glöckner et al. (2007))
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 5 / 18
6. A modified
information
retrieval approach
to produce answer
candidates for QA MIRA
Johannes
Leveling
• Shallow question answering
IRSAW
QA phases
• Expected answer type (EAT) of question
MIRA determined by Bayesian classifier:
Embedding of MIRA
Expected answer types PERSON, SUBSTANCE, ...
TüBa-D/Z annotation
MAVE • Manually annotated corpus with EAT tags (e.g.
Evaluation PERSON) and subclasses (e.g. person-first
Summary and
Future Work
person-last)
References • TüBa-D/Z newspaper corpus
(Tübingen Treebank of Written German;
http://www.sfs.uni-tuebingen.de/en_
tuebadz.shtml),
approximately 470,000 words
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 6 / 18
7. A modified
information
retrieval approach
to produce answer
candidates for QA Expected answer types
Johannes
Leveling (1/3)
IRSAW
QA phases
MIRA • Question (German): Wer wurde 1948 erster
Embedding of MIRA
Expected answer types Ministerpräsident Israels?
TüBa-D/Z annotation
MAVE • Question (English): Who became the first Prime
Evaluation minister of Israel in 1948?
Summary and
Future Work • EAT: PERSON
References • Answer string:
David ben Gurion
• Tag sequence:
person-first person-part person-last
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 7 / 18
8. A modified
information
retrieval approach
to produce answer
candidates for QA Expected answer types
Johannes
Leveling (2/3)
IRSAW
QA phases
MIRA • Question (German): In welchem Jahr endete
Embedding of MIRA
Expected answer types offiziell die Besetzung Deutschlands?
TüBa-D/Z annotation
MAVE • Question (English): In what year did the
Evaluation occupation of Germany officially end?
Summary and
Future Work • EAT: TIME
References • Answer string:
im Jahr 1955
• Tag sequence:
prep year num-card
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 8 / 18
9. A modified
information
retrieval approach
to produce answer
candidates for QA Expected answer types
Johannes
Leveling (3/3)
IRSAW
QA phases
• Question (German): Wie wird der Ebolavirus
MIRA übertragen?
Embedding of MIRA
Expected answer types • Question (English): How is the Ebola virus
TüBa-D/Z annotation
MAVE
transmitted?
Evaluation • EAT: OTHER
Summary and
Future Work
• Answer string: (Übertragen werden die
References Ebolaviren durch direkten Körperkontakt und bei
Kontakt mit Körperausscheidungen infizierter
Personen per Kontaktinfektion bzw.
Schmierinfektion.)
• Tag sequence:
– (other entity type → answer not found!)
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 9 / 18
10. A modified
information
retrieval approach
to produce answer
candidates for QA EAT frequency in
Johannes
Leveling annotated TüBa-D/Z
IRSAW
QA phases
MIRA
Embedding of MIRA
Name class Corpus frequency
Expected answer types
TüBa-D/Z annotation
LOCATION 8,274
MAVE
PERSON 14,527
Evaluation
Summary and
ORGANIZATION 7,148
Future Work TIME 14,524
References MEASURE 895
SUBSTANCE 293
OTHER 2,987
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 10 / 18
11. A modified
information
retrieval approach
to produce answer
candidates for QA EAT subclass frequency
Johannes
Leveling in annotated TüBa-D/Z
IRSAW LOCATION Subclass frequency
QA phases
MIRA city 3,717
Embedding of MIRA
Expected answer types country 1,955
TüBa-D/Z annotation
region 926
MAVE
Evaluation
street 613
Summary and
state 370
Future Work other 206
References
building 195
streetno 124
river 85
island 55
sea 17
mountain 11
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 11 / 18
12. A modified
information
retrieval approach
to produce answer
candidates for QA Tagging with subclasses
Johannes
Leveling
Token EAT Subclass
IRSAW
Vor TIME prep
QA phases 25 TIME num-card
MIRA
Jahren TIME year
Embedding of MIRA betrat –
Expected answer types Neil PERSON person-first
TüBa-D/Z annotation
Armstrong PERSON person-last
MAVE als –
erster –
Evaluation
Mensch –
Summary and den –
Future Work
Mond LOCATION other
References , –
doch –
heute TIME deictic
stagniert –
die –
bemannte –
Raumfahrt –
. –
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 12 / 18
13. A modified
information
retrieval approach
to produce answer
candidates for QA MAVE - MultiNet-based
Johannes
Leveling Answer Verification
IRSAW
QA phases
MIRA
Embedding of MIRA
Expected answer types
TüBa-D/Z annotation
• Validate answer candidates
MAVE • Test logical validity of answer candidate by using
Evaluation
Summary and
Future Work a) inferences, entailments
References b) heuristic quality indicators (fallback strategy)
• Select most trusted answer
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 13 / 18
14. A modified
information
retrieval approach
to produce answer
candidates for QA Evaluation results (1/3)
Johannes
Leveling
IRSAW
QA phases Performance results for InSicht, QAP, and MIRA
MIRA
Embedding of MIRA
based on questions from QA@CLEF data from 2004
Expected answer types
TüBa-D/Z annotation
to 2006
MAVE
Evaluation System # Candidates Coverage # Correct Precision
Summary and
Future Work InSicht 1,212 226/600 625 51.6%
References QAP 2,562 114/600 1,190 46.6%
MIRA 14,946 520/600 1,738 11.6%
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 14 / 18
15. A modified
information
retrieval approach
to produce answer
candidates for QA Evaluation results (2/3)
Johannes
Leveling
IRSAW
QA phases
Performance results including answer selection by
MIRA
Embedding of MIRA MAVE based on questions from QA@CLEF data
Expected answer types
TüBa-D/Z annotation from 2004 to 2006
MAVE
Evaluation
Run # Correct # Inexact # Wrong
Summary and
Future Work
InSicht+Mira+QAP 247.4 15.8 307.8
References
InSicht+Mira+QAP (opt.) 305.0 17.0 249.0
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 15 / 18
16. A modified
information
retrieval approach
to produce answer
candidates for QA Evaluation results (3/3)
Johannes
Leveling
IRSAW
Results for MIRA answer candidates for QA@CLEF
QA phases
data from 2003 to 2006
MIRA
Embedding of MIRA top-N
Expected answer types
TüBa-D/Z annotation
N=50 N=30 N=10 N=5
MAVE
Evaluation # Correct (2006) 798 615 215 95
Summary and # Inexact (2006) 56 53 20 12
Future Work
# Wrong (2006) 4,436 3,421 1,360 722
References
# Correct (2003–2006) 1,864 1,503 609 263
# Inexact (2003–2006) 287 248 103 54
# Wrong (2003–2006) 17,326 14,102 5,694 3,013
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 16 / 18
17. A modified
information
retrieval approach
to produce answer
candidates for QA Summary and Future
Johannes
Leveling Work
IRSAW
MIRA:
QA phases
MIRA • Produces a highly recall-oriented answer
Embedding of MIRA
Expected answer types stream,
TüBa-D/Z annotation
MAVE
• Covers more questions than the other answer
Evaluation producers in IRSAW, and
Summary and
Future Work
• Returns the largest number of correct answer
References candidates.
Future work:
• Return additional answer support for temporal
deictic expressions
• Support processing list questions
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 17 / 18
18. A modified
information
retrieval approach
to produce answer
candidates for QA Selected References
Johannes
Leveling Glöckner, Ingo; Sven Hartrumpf; and Johannes Leveling
IRSAW
(2007). Logical validation, answer merging and witness
QA phases
selection – a case study in multi-stream question answering.
In Proceedings of RIAO 2007, Large-Scale Semantic Access
MIRA
Embedding of MIRA to Content (Text, Image, Video and Sound). Pittsburgh, USA:
Expected answer types
TüBa-D/Z annotation
C.I.D.
MAVE Hartrumpf, Sven and Johannes Leveling (2007). Interpretation
Evaluation and normalization of temporal expressions for question
Summary and answering. In Evaluation of Multilingual and Multi-modal
Future Work Information Retrieval: 7th Workshop of the Cross-Language
References Evaluation Forum, CLEF 2006 (edited by et al.,
Carol Peters), volume 4730 of LNCS, pp. 432–439. Berlin:
Springer.
Leveling, Johannes (2006). On the role of information retrieval
in the question answering system IRSAW. In Proceedings of
the LWA 2006, Workshop Information Retrieval, pp.
119–125. Hildesheim, Germany: Universität Hildesheim.
Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 18 / 18