SlideShare a Scribd company logo
1 of 22
Download to read offline
Automated Focus Extraction for
    Question Answering over Topic Maps

     Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
2




       Context: domain portable Question
          Answering over Topic Maps
•Partly funded by the Flemish government as part of the ITEA2
 project LINDO (ITEA2-06011)
•The research towards portable domain question answering over
 Topic Maps is done within the Belgian part of the LINDO project.




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
3




                            Why Topic Maps?
      • Space industry needs a solution to the knowledge
        retention problem.
      • More structured than mind maps, less formal than
          RDF/OWL.
      • Allows to organize information in an ontological view.
      • An ISO standard.




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
4




                            Why Topic Maps?

                                                 Who is the composer of La Bohème?

                                                      Puccini




Automated Focus Extraction for Question Answering over Topic Maps        TMRA’09, Leipzig
5




         LINDO-BE General Architecture


                       Focus
                      Extractor                                                       Answer
Question                                                      Graph        Answer
                                          Anchorer
                                                             Reducer      Extractor



                     Time Exp.
                                                       Topic Map Engine
                     Extractor




Automated Focus Extraction for Question Answering over Topic Maps                       TMRA’09, Leipzig
6




         LINDO-BE General Architecture


                       Focus
                      Extractor                                                       Answer
Question                                                      Graph        Answer
                                          Anchorer
                                                             Reducer      Extractor



                     Time Exp.
                                                       Topic Map Engine
                     Extractor




Automated Focus Extraction for Question Answering over Topic Maps                       TMRA’09, Leipzig
7




                            Question Focus
Focus is the type of the answer in the question terminology

                                                 Who is the composer of La Bohème?

                                                      Puccini




Automated Focus Extraction for Question Answering over Topic Maps       TMRA’09, Leipzig
8




                                           Focus

             Asking Point (AP)                        Expected Answer Type (EAT)



“Who is the librettist of La Tilda?” HUMAN: “Who wrote the libretto for La Tilda?”
              (explicit)                                               (implicit)

                                                                    EAT Classes:    TIME,

                                                                                    NUMERIC,

                                                                                    DEFINITION,

                                                                                    LOCATION,
Automated Focus Extraction for Question Answering over Topic Maps                   TMRA’09, Leipzig
                                                                                    HUMAN,
9




           Is it difficult to find the focus?
      •   Where was Puccini born?
                                                                                                  City
      •   What is Puccini's place of birth?
      •   What is Puccini's birthplace?




                                                                                                is a
      •   What is the birth place of Puccini?
      •   What city was Puccini born in?                                                       Lucca
                                                                                          ce
      •   What place was Puccini born in?                                           in pla
                                                                                    n
                                                                                 or
      •   Where is Puccini from?                                               b n
                                                                                   o
                                                                                rs
                                                                              pe
                                                                    Puccini




Automated Focus Extraction for Question Answering over Topic Maps                          TMRA’09, Leipzig
10




Why AP should take precedence over EAT?
                                                    “Who is the librettist of La Tilda?”

                                                    EAT         =   HUMAN        Person
                                                    AP          =   Librettist




Automated Focus Extraction for Question Answering over Topic Maps                TMRA’09, Leipzig
11




                         Precision and Recall

                       | {relevant} I {retrieved } |
                    P=
                              | {retrieved } |


                        | {relevant} I {retrieved} |
                     R=
                                | {relevant} |


Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
12




Why AP should take precedence over EAT?
                                                    “Who is the librettist of La Tilda?”

                                                    EAT         =    HUMAN        Person
                                                    AP          =    Librettist

                                                    PAP         =    57/57           =
                                                                1
                                                    PEAT        =     57/1165        =
                                                                0.049



Automated Focus Extraction for Question Answering over Topic Maps                 TMRA’09, Leipzig
13




Why AP should take precedence over EAT?
         Results over 100 annotated questions:


                               Name            Precision            Recall

                             AP                         0.311          0.30

                             EAT                        0.089          0.21




Automated Focus Extraction for Question Answering over Topic Maps             TMRA’09, Leipzig
14




                              Focus Branching




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
15




            Focus Extractor Architecture
• Supervised machine learning based on the
  principal of maximum entropy (Maxent).
• 2100 questions have been annotated:
   • 1500 from Li & Roth corpus
   • 500 from TREC-10
   • 100 asked over the Italian Opera topic map
• The corpus was split into 80% of training and
  20% testing. The evaluation was done 10 times,
  each time shuffling the training and test data.
Question                             POS             Syntactic      Lexical     Focus      Focus
                Tokenizer
                                    Tagger            Parser        Analysis   Extractor


Automated Focus Extraction for Question Answering over Topic Maps                    TMRA’09, Leipzig
16




                   Questions Annotation
     Asking Point                                   Expected Answer Type

                                               HUMAN: Who is Puccini
          O: What                              DEFINITION: What is Tosca?
         AP: opera                             LOCATION: Where did Dante die?
          O: did                               TIME: When did Puccini die?
          O: Puccini                           NUMERIC: How many characters have
          O: write                                          been killed by poisoning?
          O: ?                                 OTHER: What did Heinrich Heine write?

        AP classifier                                          EAT classifier


Automated Focus Extraction for Question Answering over Topic Maps               TMRA’09, Leipzig
17




                                        AP Results

           Class                  Precision                    Recall       F-Score
     AskingPoint                             0.854                  0.734        0.789
     Other                                   0.973                  0.987        0.980




Automated Focus Extraction for Question Answering over Topic Maps               TMRA’09, Leipzig
18




                                        EAT Results
            Class                  Precision                    Recall      F-Score
      DEFINITION                              0.887                 0.800        0.841
      LOCATION                                0.834                 0.812        0.821
      HUMAN                                   0.904                 0.753        0.820
      TIME                                    0.880                 0.802        0.838
      NUMERIC                                 0.943                 0.782        0.854
      OTHER                                   0.746                 0.893        0.812



Automated Focus Extraction for Question Answering over Topic Maps               TMRA’09, Leipzig
19




                                   Overall Results
       The overall results are provided as the accuracy
       of the classifier.

         Accuracy = correct instances / overall instances

                                         Value                      Std dev      Std err

   Focus (AP+EAT)                               0.827                    0.020         0.006




Automated Focus Extraction for Question Answering over Topic Maps                  TMRA’09, Leipzig
20




                         Prediction of Accuracy




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
21




                                   Conclusions
       • We achieved 82.7% accuracy for focus extraction.
       • The specificity of the focus degrades gracefully (we first try
         to extract the AP, and fall back to the EAT).
       • The focus is identified dynamically instead of relying on
         static taxonomy of question types.
       • Machine learning techniques were used throughout the
         application stack.
       • The results could be improved with more training data.
       • The whole setting is domain independent.



Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig
22




                                     Questions?


                                          Thank you




Automated Focus Extraction for Question Answering over Topic Maps   TMRA’09, Leipzig

More Related Content

More from tmra

External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Databasetmra
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brntmra
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic mapstmra
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Databasetmra
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federationtmra
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentstmra
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Mapstmra
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapstmra
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorertmra
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuurapostertmra
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementtmra
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010tmra
 
Presentation final
Presentation finalPresentation final
Presentation finaltmra
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontologytmra
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressionstmra
 
Mappe1
Mappe1Mappe1
Mappe1tmra
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semanticstmra
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integrationtmra
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Frameworktmra
 
Hatana tmra 2010
Hatana tmra 2010Hatana tmra 2010
Hatana tmra 2010tmra
 

More from tmra (20)

External Schema for Topic Map Database
External Schema for Topic Map DatabaseExternal Schema for Topic Map Database
External Schema for Topic Map Database
 
Weber 2010 brn
Weber 2010 brnWeber 2010 brn
Weber 2010 brn
 
Subject Headings make information to be topic maps
Subject Headings make information to be topic mapsSubject Headings make information to be topic maps
Subject Headings make information to be topic maps
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
Topic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge FederationTopic Merge Scenarios for Knowledge Federation
Topic Merge Scenarios for Knowledge Federation
 
JavaScript Topic Maps in server environments
JavaScript Topic Maps in server environmentsJavaScript Topic Maps in server environments
JavaScript Topic Maps in server environments
 
Modelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic MapsModelling IMS QTI with Topic Maps
Modelling IMS QTI with Topic Maps
 
Designing a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_mapsDesigning a gui_description_language_with_topic_maps
Designing a gui_description_language_with_topic_maps
 
Maiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorerMaiana - The social Topic Maps explorer
Maiana - The social Topic Maps explorer
 
Tmra2010 matsuuraposter
Tmra2010 matsuuraposterTmra2010 matsuuraposter
Tmra2010 matsuuraposter
 
Automatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge managementAutomatic semantic interpretation of unstructured data for knowledge management
Automatic semantic interpretation of unstructured data for knowledge management
 
Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010Putting topic maps to rest.tmra2010
Putting topic maps to rest.tmra2010
 
Presentation final
Presentation finalPresentation final
Presentation final
 
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based OntologyEvaluation of Instances Asset in a Topic Maps-Based Ontology
Evaluation of Instances Asset in a Topic Maps-Based Ontology
 
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path ExpressionsDefining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
 
Mappe1
Mappe1Mappe1
Mappe1
 
Et Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse SemanticsEt Tu, Brute? Topic Maps and Discourse Semantics
Et Tu, Brute? Topic Maps and Discourse Semantics
 
A PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS IntegrationA PHP library for Ontopia-CMS Integration
A PHP library for Ontopia-CMS Integration
 
Live Integration Framework
Live Integration FrameworkLive Integration Framework
Live Integration Framework
 
Hatana tmra 2010
Hatana tmra 2010Hatana tmra 2010
Hatana tmra 2010
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Automated Focus Extraction for QA over Topic Maps

  • 1. Automated Focus Extraction for Question Answering over Topic Maps Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 2. 2 Context: domain portable Question Answering over Topic Maps •Partly funded by the Flemish government as part of the ITEA2 project LINDO (ITEA2-06011) •The research towards portable domain question answering over Topic Maps is done within the Belgian part of the LINDO project. Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 3. 3 Why Topic Maps? • Space industry needs a solution to the knowledge retention problem. • More structured than mind maps, less formal than RDF/OWL. • Allows to organize information in an ontological view. • An ISO standard. Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 4. 4 Why Topic Maps? Who is the composer of La Bohème? Puccini Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 5. 5 LINDO-BE General Architecture Focus Extractor Answer Question Graph Answer Anchorer Reducer Extractor Time Exp. Topic Map Engine Extractor Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 6. 6 LINDO-BE General Architecture Focus Extractor Answer Question Graph Answer Anchorer Reducer Extractor Time Exp. Topic Map Engine Extractor Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 7. 7 Question Focus Focus is the type of the answer in the question terminology Who is the composer of La Bohème? Puccini Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 8. 8 Focus Asking Point (AP) Expected Answer Type (EAT) “Who is the librettist of La Tilda?” HUMAN: “Who wrote the libretto for La Tilda?” (explicit) (implicit) EAT Classes: TIME, NUMERIC, DEFINITION, LOCATION, Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig HUMAN,
  • 9. 9 Is it difficult to find the focus? • Where was Puccini born? City • What is Puccini's place of birth? • What is Puccini's birthplace? is a • What is the birth place of Puccini? • What city was Puccini born in? Lucca ce • What place was Puccini born in? in pla n or • Where is Puccini from? b n o rs pe Puccini Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 10. 10 Why AP should take precedence over EAT? “Who is the librettist of La Tilda?” EAT = HUMAN Person AP = Librettist Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 11. 11 Precision and Recall | {relevant} I {retrieved } | P= | {retrieved } | | {relevant} I {retrieved} | R= | {relevant} | Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 12. 12 Why AP should take precedence over EAT? “Who is the librettist of La Tilda?” EAT = HUMAN Person AP = Librettist PAP = 57/57 = 1 PEAT = 57/1165 = 0.049 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 13. 13 Why AP should take precedence over EAT? Results over 100 annotated questions: Name Precision Recall AP 0.311 0.30 EAT 0.089 0.21 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 14. 14 Focus Branching Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 15. 15 Focus Extractor Architecture • Supervised machine learning based on the principal of maximum entropy (Maxent). • 2100 questions have been annotated: • 1500 from Li & Roth corpus • 500 from TREC-10 • 100 asked over the Italian Opera topic map • The corpus was split into 80% of training and 20% testing. The evaluation was done 10 times, each time shuffling the training and test data. Question POS Syntactic Lexical Focus Focus Tokenizer Tagger Parser Analysis Extractor Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 16. 16 Questions Annotation Asking Point Expected Answer Type HUMAN: Who is Puccini O: What DEFINITION: What is Tosca? AP: opera LOCATION: Where did Dante die? O: did TIME: When did Puccini die? O: Puccini NUMERIC: How many characters have O: write been killed by poisoning? O: ? OTHER: What did Heinrich Heine write? AP classifier EAT classifier Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 17. 17 AP Results Class Precision Recall F-Score AskingPoint 0.854 0.734 0.789 Other 0.973 0.987 0.980 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 18. 18 EAT Results Class Precision Recall F-Score DEFINITION 0.887 0.800 0.841 LOCATION 0.834 0.812 0.821 HUMAN 0.904 0.753 0.820 TIME 0.880 0.802 0.838 NUMERIC 0.943 0.782 0.854 OTHER 0.746 0.893 0.812 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 19. 19 Overall Results The overall results are provided as the accuracy of the classifier. Accuracy = correct instances / overall instances Value Std dev Std err Focus (AP+EAT) 0.827 0.020 0.006 Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 20. 20 Prediction of Accuracy Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 21. 21 Conclusions • We achieved 82.7% accuracy for focus extraction. • The specificity of the focus degrades gracefully (we first try to extract the AP, and fall back to the EAT). • The focus is identified dynamically instead of relying on static taxonomy of question types. • Machine learning techniques were used throughout the application stack. • The results could be improved with more training data. • The whole setting is domain independent. Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig
  • 22. 22 Questions? Thank you Automated Focus Extraction for Question Answering over Topic Maps TMRA’09, Leipzig