This document discusses automated focus extraction for question answering over topic maps. It presents the architecture of a focus extractor that uses machine learning techniques to identify the asking point and expected answer type of questions. Based on a corpus of 2,100 annotated questions, the focus extractor achieved 82.7% accuracy. The approach allows for domain-independent question answering over topic maps.
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Automated Focus Extraction for QA over Topic Maps
1. Automated Focus Extraction for
Question Answering over Topic Maps
Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
2. 2
Context: domain portable Question
Answering over Topic Maps
•Partly funded by the Flemish government as part of the ITEA2
project LINDO (ITEA2-06011)
•The research towards portable domain question answering over
Topic Maps is done within the Belgian part of the LINDO project.
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
3. 3
Why Topic Maps?
• Space industry needs a solution to the knowledge
retention problem.
• More structured than mind maps, less formal than
RDF/OWL.
• Allows to organize information in an ontological view.
• An ISO standard.
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
4. 4
Why Topic Maps?
Who is the composer of La Bohème?
Puccini
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
5. 5
LINDO-BE General Architecture
Focus
Extractor Answer
Question Graph Answer
Anchorer
Reducer Extractor
Time Exp.
Topic Map Engine
Extractor
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
6. 6
LINDO-BE General Architecture
Focus
Extractor Answer
Question Graph Answer
Anchorer
Reducer Extractor
Time Exp.
Topic Map Engine
Extractor
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
7. 7
Question Focus
Focus is the type of the answer in the question terminology
Who is the composer of La Bohème?
Puccini
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
8. 8
Focus
Asking Point (AP) Expected Answer Type (EAT)
“Who is the librettist of La Tilda?” HUMAN: “Who wrote the libretto for La Tilda?”
(explicit) (implicit)
EAT Classes: TIME,
NUMERIC,
DEFINITION,
LOCATION,
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
HUMAN,
9. 9
Is it difficult to find the focus?
• Where was Puccini born?
City
• What is Puccini's place of birth?
• What is Puccini's birthplace?
is a
• What is the birth place of Puccini?
• What city was Puccini born in? Lucca
ce
• What place was Puccini born in? in pla
n
or
• Where is Puccini from? b n
o
rs
pe
Puccini
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
10. 10
Why AP should take precedence over EAT?
“Who is the librettist of La Tilda?”
EAT = HUMAN Person
AP = Librettist
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
11. 11
Precision and Recall
| {relevant} I {retrieved } |
P=
| {retrieved } |
| {relevant} I {retrieved} |
R=
| {relevant} |
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
12. 12
Why AP should take precedence over EAT?
“Who is the librettist of La Tilda?”
EAT = HUMAN Person
AP = Librettist
PAP = 57/57 =
1
PEAT = 57/1165 =
0.049
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
13. 13
Why AP should take precedence over EAT?
Results over 100 annotated questions:
Name Precision Recall
AP 0.311 0.30
EAT 0.089 0.21
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
14. 14
Focus Branching
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
15. 15
Focus Extractor Architecture
• Supervised machine learning based on the
principal of maximum entropy (Maxent).
• 2100 questions have been annotated:
• 1500 from Li & Roth corpus
• 500 from TREC-10
• 100 asked over the Italian Opera topic map
• The corpus was split into 80% of training and
20% testing. The evaluation was done 10 times,
each time shuffling the training and test data.
Question POS Syntactic Lexical Focus Focus
Tokenizer
Tagger Parser Analysis Extractor
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
16. 16
Questions Annotation
Asking Point Expected Answer Type
HUMAN: Who is Puccini
O: What DEFINITION: What is Tosca?
AP: opera LOCATION: Where did Dante die?
O: did TIME: When did Puccini die?
O: Puccini NUMERIC: How many characters have
O: write been killed by poisoning?
O: ? OTHER: What did Heinrich Heine write?
AP classifier EAT classifier
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
17. 17
AP Results
Class Precision Recall F-Score
AskingPoint 0.854 0.734 0.789
Other 0.973 0.987 0.980
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
18. 18
EAT Results
Class Precision Recall F-Score
DEFINITION 0.887 0.800 0.841
LOCATION 0.834 0.812 0.821
HUMAN 0.904 0.753 0.820
TIME 0.880 0.802 0.838
NUMERIC 0.943 0.782 0.854
OTHER 0.746 0.893 0.812
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
19. 19
Overall Results
The overall results are provided as the accuracy
of the classifier.
Accuracy = correct instances / overall instances
Value Std dev Std err
Focus (AP+EAT) 0.827 0.020 0.006
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
20. 20
Prediction of Accuracy
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
21. 21
Conclusions
• We achieved 82.7% accuracy for focus extraction.
• The specificity of the focus degrades gracefully (we first try
to extract the AP, and fall back to the EAT).
• The focus is identified dynamically instead of relying on
static taxonomy of question types.
• Machine learning techniques were used throughout the
application stack.
• The results could be improved with more training data.
• The whole setting is domain independent.
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
22. 22
Questions?
Thank you
Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig