Automated Focus Extraction for QA over Topic Maps

Automated Focus Extraction for
Question Answering over Topic Maps

Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas

Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig

2

Context: domain portable Question
Answering over Topic Maps
•Partly funded by the Flemish government as part of the ITEA2
project LINDO (ITEA2-06011)
•The research towards portable domain question answering over
Topic Maps is done within the Belgian part of the LINDO project.


3

Why Topic Maps?
• Space industry needs a solution to the knowledge
retention problem.
• More structured than mind maps, less formal than
RDF/OWL.
• Allows to organize information in an ontological view.
• An ISO standard.


4

Why Topic Maps?

Who is the composer of La Bohème?

Puccini


5

LINDO-BE General Architecture

Focus
Extractor Answer
Question Graph Answer
Anchorer
Reducer Extractor

Time Exp.
Topic Map Engine
Extractor


6

LINDO-BE General Architecture

Focus
Extractor Answer
Question Graph Answer
Anchorer
Reducer Extractor

Time Exp.
Topic Map Engine
Extractor


7

Question Focus
Focus is the type of the answer in the question terminology

Who is the composer of La Bohème?

Puccini


8

Focus

Asking Point (AP) Expected Answer Type (EAT)

“Who is the librettist of La Tilda?” HUMAN: “Who wrote the libretto for La Tilda?”
(explicit) (implicit)

EAT Classes: TIME,

NUMERIC,

DEFINITION,

LOCATION,
HUMAN,

9

Is it difficult to find the focus?
• Where was Puccini born?
City
• What is Puccini's place of birth?
• What is Puccini's birthplace?

is a
• What is the birth place of Puccini?
• What city was Puccini born in? Lucca
ce
• What place was Puccini born in? in pla
n
or
• Where is Puccini from? b n
o
rs
pe
Puccini


10

Why AP should take precedence over EAT?
“Who is the librettist of La Tilda?”

EAT = HUMAN Person
AP = Librettist


11

Precision and Recall

| {relevant} I {retrieved } |
P=
| {retrieved } |

| {relevant} I {retrieved} |
R=
| {relevant} |


12

“Who is the librettist of La Tilda?”

EAT = HUMAN Person
AP = Librettist

PAP = 57/57 =
1
PEAT = 57/1165 =
0.049


13

Results over 100 annotated questions:

Name Precision Recall

AP 0.311 0.30

EAT 0.089 0.21


14

Focus Branching


15

Focus Extractor Architecture
• Supervised machine learning based on the
principal of maximum entropy (Maxent).
• 2100 questions have been annotated:
• 1500 from Li & Roth corpus
• 500 from TREC-10
• 100 asked over the Italian Opera topic map
• The corpus was split into 80% of training and
20% testing. The evaluation was done 10 times,
each time shuffling the training and test data.
Question POS Syntactic Lexical Focus Focus
Tokenizer
Tagger Parser Analysis Extractor


16

Questions Annotation
Asking Point Expected Answer Type

HUMAN: Who is Puccini
O: What DEFINITION: What is Tosca?
AP: opera LOCATION: Where did Dante die?
O: did TIME: When did Puccini die?
O: Puccini NUMERIC: How many characters have
O: write been killed by poisoning?
O: ? OTHER: What did Heinrich Heine write?

AP classifier EAT classifier


17

AP Results

Class Precision Recall F-Score
AskingPoint 0.854 0.734 0.789
Other 0.973 0.987 0.980


18

EAT Results
Class Precision Recall F-Score
DEFINITION 0.887 0.800 0.841
LOCATION 0.834 0.812 0.821
HUMAN 0.904 0.753 0.820
TIME 0.880 0.802 0.838
NUMERIC 0.943 0.782 0.854
OTHER 0.746 0.893 0.812


19

Overall Results
The overall results are provided as the accuracy
of the classifier.

Accuracy = correct instances / overall instances

Value Std dev Std err

Focus (AP+EAT) 0.827 0.020 0.006


20

Prediction of Accuracy


21

Conclusions
• We achieved 82.7% accuracy for focus extraction.
• The specificity of the focus degrades gracefully (we first try
to extract the AP, and fall back to the EAT).
• The focus is identified dynamically instead of relying on
static taxonomy of question types.
• Machine learning techniques were used throughout the
application stack.
• The results could be improved with more training data.
• The whole setting is domain independent.


22

Questions?

Thank you


Automated Focus Extraction for QA over Topic Maps

Recommended

Recommended

More Related Content

More from tmra

More from tmra (20)

Recently uploaded

Recently uploaded (20)

Automated Focus Extraction for QA over Topic Maps