SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
A Modified Information Retrieval
 Approach to Produce Answer
   Candidates for Question
         Answering

                Johannes Leveling

  Intelligent Information and Communication Systems (IICS)
         University of Hagen (FernUniversität in Hagen)
                     58084 Hagen, Germany
          johannes.leveling@fernuni-hagen.de


 LWA 2007 Workshop, Halle (Saale), Germany
A modified
    information
retrieval approach
to produce answer
candidates for QA                                                                    Outline
    Johannes
     Leveling
                        1 IRSAW
IRSAW

QA phases
                        2 QA phases
MIRA
Embedding of MIRA
Expected answer types   3 MIRA
TüBa-D/Z annotation

MAVE
                                Embedding of MIRA
Evaluation                      Expected answer types
Summary and                     TüBa-D/Z annotation
Future Work

References
                        4 MAVE

                        5 Evaluation

                        6 Summary and Future Work


            Johannes Leveling      A modified information retrieval approach to produce answer candidates for QA   2 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                                               IRSAW question
    Johannes
     Leveling                                               answering framework
IRSAW

QA phases                   IRSAW framework                    Local
                                                               Database
MIRA
Embedding of MIRA
                                Documents
Expected answer types
                                          Document
TüBa-D/Z annotation                                                       Answer candidate
                                          preprocessing
                                                                          producer:   InSicht
MAVE

Evaluation                                                                Answer candidate          Answer validation
                                                                          producer:   QAP           and selection: MAVE
Summary and                     Natural language question
Future Work                                                               Answer candidate                                Answer
                                          Question
                                                                          producer:   MIRA
References                                processing

                                                                     Produce answer candidates




                         IRSAW: Intelligent Information Retrieval on the Basis of a
                         Semantically Annotated Web
                         funded by the DFG (Deutsche Forschungsgemeinschaft)
            Johannes Leveling                 A modified information retrieval approach to produce answer candidates for QA         3 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                                    Question answering
    Johannes
     Leveling                                                   phases
IRSAW

QA phases

MIRA
Embedding of MIRA
                                1   Process document collection
Expected answer types
TüBa-D/Z annotation             2   Preprocess question
MAVE                                (⇐ Natural language question)
Evaluation

Summary and
                                3   Retrieve text segments
Future Work
                                4   Match document and question representations
References
                                5   Return answer candidates
                                6   Merge and validate answer candidates
                                    (⇒ Answer)


            Johannes Leveling          A modified information retrieval approach to produce answer candidates for QA   4 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Embedding of MIRA in
    Johannes
     Leveling                                            IRSAW
IRSAW

QA phases
                                • Employ different modules to produce data
MIRA
                                  streams containing answer candidates:
Embedding of MIRA
Expected answer types
                                    • InSicht (Matching semantic network
TüBa-D/Z annotation
                                      representations, Hartrumpf and Leveling (2007))
MAVE
                                    • QAP (Question Answering by Pattern matching,
Evaluation

Summary and
                                      Leveling (2006)), and
Future Work                         • MIRA (Modified Information Retrieval Approach)
References
                                • Use different methods to produce answer
                                  streams to increase recall and robustness
                                • Merge, rank, logically validate answer
                                  candidates and select best answer, (MAVE,
                                  Glöckner et al. (2007))
            Johannes Leveling        A modified information retrieval approach to produce answer candidates for QA   5 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                                                                          MIRA
    Johannes
     Leveling
                                • Shallow question answering
IRSAW

QA phases
                                • Expected answer type (EAT) of question
MIRA                              determined by Bayesian classifier:
Embedding of MIRA
Expected answer types             PERSON, SUBSTANCE, ...
TüBa-D/Z annotation

MAVE                            • Manually annotated corpus with EAT tags (e.g.
Evaluation                        PERSON) and subclasses (e.g. person-first
Summary and
Future Work
                                  person-last)
References                      • TüBa-D/Z newspaper corpus
                                  (Tübingen Treebank of Written German;
                                  http://www.sfs.uni-tuebingen.de/en_
                                  tuebadz.shtml),
                                  approximately 470,000 words

            Johannes Leveling        A modified information retrieval approach to produce answer candidates for QA   6 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Expected answer types
    Johannes
     Leveling                                                (1/3)
IRSAW

QA phases

MIRA                            • Question (German): Wer wurde 1948 erster
Embedding of MIRA
Expected answer types               Ministerpräsident Israels?
TüBa-D/Z annotation

MAVE                            •   Question (English): Who became the first Prime
Evaluation                          minister of Israel in 1948?
Summary and
Future Work                     •   EAT: PERSON
References                      •   Answer string:
                                    David         ben         Gurion
                                •   Tag sequence:
                                    person-first person-part person-last


            Johannes Leveling          A modified information retrieval approach to produce answer candidates for QA   7 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                            Expected answer types
    Johannes
     Leveling                                                 (2/3)
IRSAW

QA phases

MIRA                            • Question (German): In welchem Jahr endete
Embedding of MIRA
Expected answer types               offiziell die Besetzung Deutschlands?
TüBa-D/Z annotation

MAVE                            •   Question (English): In what year did the
Evaluation                          occupation of Germany officially end?
Summary and
Future Work                     •   EAT: TIME
References                      •   Answer string:
                                    im      Jahr 1955
                                •   Tag sequence:
                                    prep year num-card


            Johannes Leveling           A modified information retrieval approach to produce answer candidates for QA   8 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Expected answer types
    Johannes
     Leveling                                                (3/3)
IRSAW

QA phases
                                • Question (German): Wie wird der Ebolavirus
MIRA                                übertragen?
Embedding of MIRA
Expected answer types           •   Question (English): How is the Ebola virus
TüBa-D/Z annotation

MAVE
                                    transmitted?
Evaluation                      •   EAT: OTHER
Summary and
Future Work
                                •   Answer string: (Übertragen werden die
References                          Ebolaviren durch direkten Körperkontakt und bei
                                    Kontakt mit Körperausscheidungen infizierter
                                    Personen per Kontaktinfektion bzw.
                                    Schmierinfektion.)
                                •   Tag sequence:
                                    – (other entity type → answer not found!)
            Johannes Leveling          A modified information retrieval approach to produce answer candidates for QA   9 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                                EAT frequency in
    Johannes
     Leveling                                 annotated TüBa-D/Z
IRSAW

QA phases

MIRA
Embedding of MIRA
                           Name class                                     Corpus frequency
Expected answer types
TüBa-D/Z annotation
                           LOCATION                                                            8,274
MAVE
                           PERSON                                                             14,527
Evaluation

Summary and
                           ORGANIZATION                                                        7,148
Future Work                TIME                                                               14,524
References                 MEASURE                                                               895
                           SUBSTANCE                                                             293
                           OTHER                                                               2,987




            Johannes Leveling    A modified information retrieval approach to produce answer candidates for QA   10 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                     EAT subclass frequency
    Johannes
     Leveling                          in annotated TüBa-D/Z
IRSAW                      LOCATION                                    Subclass frequency
QA phases

MIRA                       city                                                                 3,717
Embedding of MIRA
Expected answer types      country                                                              1,955
TüBa-D/Z annotation
                           region                                                                 926
MAVE

Evaluation
                           street                                                                 613
Summary and
                           state                                                                  370
Future Work                other                                                                  206
References
                           building                                                               195
                           streetno                                                               124
                           river                                                                   85
                           island                                                                  55
                           sea                                                                     17
                           mountain                                                                11
            Johannes Leveling    A modified information retrieval approach to produce answer candidates for QA   11 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                       Tagging with subclasses
    Johannes
     Leveling
                                Token                                   EAT        Subclass
IRSAW
                                Vor                                TIME               prep
QA phases                       25                                 TIME           num-card
MIRA
                                Jahren                             TIME               year
Embedding of MIRA               betrat                                –
Expected answer types           Neil                            PERSON          person-first
TüBa-D/Z annotation
                                Armstrong                       PERSON          person-last
MAVE                            als                                   –
                                erster                                –
Evaluation
                                Mensch                                –
Summary and                     den                                   –
Future Work
                                Mond                           LOCATION                 other
References                      ,                                     –
                                doch                                  –
                                heute                              TIME               deictic
                                stagniert                             –
                                die                                   –
                                bemannte                              –
                                Raumfahrt                             –
                                .                                     –


            Johannes Leveling       A modified information retrieval approach to produce answer candidates for QA   12 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                          MAVE - MultiNet-based
    Johannes
     Leveling                                 Answer Verification
IRSAW

QA phases

MIRA
Embedding of MIRA
Expected answer types
TüBa-D/Z annotation
                                • Validate answer candidates
MAVE                            • Test logical validity of answer candidate by using
Evaluation

Summary and
Future Work                        a) inferences, entailments
References                         b) heuristic quality indicators (fallback strategy)
                                • Select most trusted answer




            Johannes Leveling         A modified information retrieval approach to produce answer candidates for QA   13 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Evaluation results (1/3)
    Johannes
     Leveling


IRSAW

QA phases                Performance results for InSicht, QAP, and MIRA
MIRA
Embedding of MIRA
                         based on questions from QA@CLEF data from 2004
Expected answer types
TüBa-D/Z annotation
                         to 2006
MAVE

Evaluation                  System       # Candidates             Coverage            # Correct           Precision
Summary and
Future Work                 InSicht                 1,212            226/600                  625             51.6%
References                  QAP                     2,562            114/600                1,190             46.6%
                            MIRA                   14,946            520/600                1,738             11.6%




            Johannes Leveling         A modified information retrieval approach to produce answer candidates for QA   14 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                        Evaluation results (2/3)
    Johannes
     Leveling


IRSAW

QA phases
                         Performance results including answer selection by
MIRA
Embedding of MIRA        MAVE based on questions from QA@CLEF data
Expected answer types
TüBa-D/Z annotation      from 2004 to 2006
MAVE

Evaluation
                            Run                                   # Correct          # Inexact          # Wrong
Summary and
Future Work
                            InSicht+Mira+QAP                        247.4                15.8             307.8
References
                            InSicht+Mira+QAP (opt.)                 305.0                17.0             249.0




            Johannes Leveling      A modified information retrieval approach to produce answer candidates for QA   15 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                        Evaluation results (3/3)
    Johannes
     Leveling


IRSAW
                         Results for MIRA answer candidates for QA@CLEF
QA phases
                         data from 2003 to 2006
MIRA
Embedding of MIRA                                                                     top-N
Expected answer types
TüBa-D/Z annotation
                                                                  N=50           N=30          N=10           N=5
MAVE

Evaluation                 # Correct (2006)                         798            615          215                95
Summary and                # Inexact (2006)                          56             53           20                12
Future Work
                           # Wrong (2006)                         4,436          3,421        1,360               722
References

                           # Correct (2003–2006)                 1,864          1,503           609           263
                           # Inexact (2003–2006)                   287            248           103            54
                           # Wrong (2003–2006)                  17,326         14,102         5,694         3,013



            Johannes Leveling      A modified information retrieval approach to produce answer candidates for QA     16 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                           Summary and Future
    Johannes
     Leveling                                            Work
IRSAW
                         MIRA:
QA phases

MIRA                       • Produces a highly recall-oriented answer
Embedding of MIRA
Expected answer types        stream,
TüBa-D/Z annotation

MAVE
                           • Covers more questions than the other answer
Evaluation                   producers in IRSAW, and
Summary and
Future Work
                           • Returns the largest number of correct answer
References                   candidates.
                         Future work:
                           • Return additional answer support for temporal
                             deictic expressions
                           • Support processing list questions

            Johannes Leveling    A modified information retrieval approach to produce answer candidates for QA   17 / 18
A modified
    information
retrieval approach
to produce answer
candidates for QA                              Selected References
    Johannes
     Leveling            Glöckner, Ingo; Sven Hartrumpf; and Johannes Leveling
IRSAW
                           (2007). Logical validation, answer merging and witness
QA phases
                           selection – a case study in multi-stream question answering.
                           In Proceedings of RIAO 2007, Large-Scale Semantic Access
MIRA
Embedding of MIRA          to Content (Text, Image, Video and Sound). Pittsburgh, USA:
Expected answer types
TüBa-D/Z annotation
                           C.I.D.
MAVE                     Hartrumpf, Sven and Johannes Leveling (2007). Interpretation
Evaluation                 and normalization of temporal expressions for question
Summary and                answering. In Evaluation of Multilingual and Multi-modal
Future Work                Information Retrieval: 7th Workshop of the Cross-Language
References                 Evaluation Forum, CLEF 2006 (edited by et al.,
                           Carol Peters), volume 4730 of LNCS, pp. 432–439. Berlin:
                           Springer.
                         Leveling, Johannes (2006). On the role of information retrieval
                           in the question answering system IRSAW. In Proceedings of
                           the LWA 2006, Workshop Information Retrieval, pp.
                           119–125. Hildesheim, Germany: Universität Hildesheim.
            Johannes Leveling      A modified information retrieval approach to produce answer candidates for QA   18 / 18

Más contenido relacionado

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

A Modified Information Retrieval Approach to Produce Candidates for Question Answering

  • 1. A Modified Information Retrieval Approach to Produce Answer Candidates for Question Answering Johannes Leveling Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen) 58084 Hagen, Germany johannes.leveling@fernuni-hagen.de LWA 2007 Workshop, Halle (Saale), Germany
  • 2. A modified information retrieval approach to produce answer candidates for QA Outline Johannes Leveling 1 IRSAW IRSAW QA phases 2 QA phases MIRA Embedding of MIRA Expected answer types 3 MIRA TüBa-D/Z annotation MAVE Embedding of MIRA Evaluation Expected answer types Summary and TüBa-D/Z annotation Future Work References 4 MAVE 5 Evaluation 6 Summary and Future Work Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 2 / 18
  • 3. A modified information retrieval approach to produce answer candidates for QA IRSAW question Johannes Leveling answering framework IRSAW QA phases IRSAW framework Local Database MIRA Embedding of MIRA Documents Expected answer types Document TüBa-D/Z annotation Answer candidate preprocessing producer: InSicht MAVE Evaluation Answer candidate Answer validation producer: QAP and selection: MAVE Summary and Natural language question Future Work Answer candidate Answer Question producer: MIRA References processing Produce answer candidates IRSAW: Intelligent Information Retrieval on the Basis of a Semantically Annotated Web funded by the DFG (Deutsche Forschungsgemeinschaft) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 3 / 18
  • 4. A modified information retrieval approach to produce answer candidates for QA Question answering Johannes Leveling phases IRSAW QA phases MIRA Embedding of MIRA 1 Process document collection Expected answer types TüBa-D/Z annotation 2 Preprocess question MAVE (⇐ Natural language question) Evaluation Summary and 3 Retrieve text segments Future Work 4 Match document and question representations References 5 Return answer candidates 6 Merge and validate answer candidates (⇒ Answer) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 4 / 18
  • 5. A modified information retrieval approach to produce answer candidates for QA Embedding of MIRA in Johannes Leveling IRSAW IRSAW QA phases • Employ different modules to produce data MIRA streams containing answer candidates: Embedding of MIRA Expected answer types • InSicht (Matching semantic network TüBa-D/Z annotation representations, Hartrumpf and Leveling (2007)) MAVE • QAP (Question Answering by Pattern matching, Evaluation Summary and Leveling (2006)), and Future Work • MIRA (Modified Information Retrieval Approach) References • Use different methods to produce answer streams to increase recall and robustness • Merge, rank, logically validate answer candidates and select best answer, (MAVE, Glöckner et al. (2007)) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 5 / 18
  • 6. A modified information retrieval approach to produce answer candidates for QA MIRA Johannes Leveling • Shallow question answering IRSAW QA phases • Expected answer type (EAT) of question MIRA determined by Bayesian classifier: Embedding of MIRA Expected answer types PERSON, SUBSTANCE, ... TüBa-D/Z annotation MAVE • Manually annotated corpus with EAT tags (e.g. Evaluation PERSON) and subclasses (e.g. person-first Summary and Future Work person-last) References • TüBa-D/Z newspaper corpus (Tübingen Treebank of Written German; http://www.sfs.uni-tuebingen.de/en_ tuebadz.shtml), approximately 470,000 words Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 6 / 18
  • 7. A modified information retrieval approach to produce answer candidates for QA Expected answer types Johannes Leveling (1/3) IRSAW QA phases MIRA • Question (German): Wer wurde 1948 erster Embedding of MIRA Expected answer types Ministerpräsident Israels? TüBa-D/Z annotation MAVE • Question (English): Who became the first Prime Evaluation minister of Israel in 1948? Summary and Future Work • EAT: PERSON References • Answer string: David ben Gurion • Tag sequence: person-first person-part person-last Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 7 / 18
  • 8. A modified information retrieval approach to produce answer candidates for QA Expected answer types Johannes Leveling (2/3) IRSAW QA phases MIRA • Question (German): In welchem Jahr endete Embedding of MIRA Expected answer types offiziell die Besetzung Deutschlands? TüBa-D/Z annotation MAVE • Question (English): In what year did the Evaluation occupation of Germany officially end? Summary and Future Work • EAT: TIME References • Answer string: im Jahr 1955 • Tag sequence: prep year num-card Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 8 / 18
  • 9. A modified information retrieval approach to produce answer candidates for QA Expected answer types Johannes Leveling (3/3) IRSAW QA phases • Question (German): Wie wird der Ebolavirus MIRA übertragen? Embedding of MIRA Expected answer types • Question (English): How is the Ebola virus TüBa-D/Z annotation MAVE transmitted? Evaluation • EAT: OTHER Summary and Future Work • Answer string: (Übertragen werden die References Ebolaviren durch direkten Körperkontakt und bei Kontakt mit Körperausscheidungen infizierter Personen per Kontaktinfektion bzw. Schmierinfektion.) • Tag sequence: – (other entity type → answer not found!) Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 9 / 18
  • 10. A modified information retrieval approach to produce answer candidates for QA EAT frequency in Johannes Leveling annotated TüBa-D/Z IRSAW QA phases MIRA Embedding of MIRA Name class Corpus frequency Expected answer types TüBa-D/Z annotation LOCATION 8,274 MAVE PERSON 14,527 Evaluation Summary and ORGANIZATION 7,148 Future Work TIME 14,524 References MEASURE 895 SUBSTANCE 293 OTHER 2,987 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 10 / 18
  • 11. A modified information retrieval approach to produce answer candidates for QA EAT subclass frequency Johannes Leveling in annotated TüBa-D/Z IRSAW LOCATION Subclass frequency QA phases MIRA city 3,717 Embedding of MIRA Expected answer types country 1,955 TüBa-D/Z annotation region 926 MAVE Evaluation street 613 Summary and state 370 Future Work other 206 References building 195 streetno 124 river 85 island 55 sea 17 mountain 11 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 11 / 18
  • 12. A modified information retrieval approach to produce answer candidates for QA Tagging with subclasses Johannes Leveling Token EAT Subclass IRSAW Vor TIME prep QA phases 25 TIME num-card MIRA Jahren TIME year Embedding of MIRA betrat – Expected answer types Neil PERSON person-first TüBa-D/Z annotation Armstrong PERSON person-last MAVE als – erster – Evaluation Mensch – Summary and den – Future Work Mond LOCATION other References , – doch – heute TIME deictic stagniert – die – bemannte – Raumfahrt – . – Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 12 / 18
  • 13. A modified information retrieval approach to produce answer candidates for QA MAVE - MultiNet-based Johannes Leveling Answer Verification IRSAW QA phases MIRA Embedding of MIRA Expected answer types TüBa-D/Z annotation • Validate answer candidates MAVE • Test logical validity of answer candidate by using Evaluation Summary and Future Work a) inferences, entailments References b) heuristic quality indicators (fallback strategy) • Select most trusted answer Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 13 / 18
  • 14. A modified information retrieval approach to produce answer candidates for QA Evaluation results (1/3) Johannes Leveling IRSAW QA phases Performance results for InSicht, QAP, and MIRA MIRA Embedding of MIRA based on questions from QA@CLEF data from 2004 Expected answer types TüBa-D/Z annotation to 2006 MAVE Evaluation System # Candidates Coverage # Correct Precision Summary and Future Work InSicht 1,212 226/600 625 51.6% References QAP 2,562 114/600 1,190 46.6% MIRA 14,946 520/600 1,738 11.6% Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 14 / 18
  • 15. A modified information retrieval approach to produce answer candidates for QA Evaluation results (2/3) Johannes Leveling IRSAW QA phases Performance results including answer selection by MIRA Embedding of MIRA MAVE based on questions from QA@CLEF data Expected answer types TüBa-D/Z annotation from 2004 to 2006 MAVE Evaluation Run # Correct # Inexact # Wrong Summary and Future Work InSicht+Mira+QAP 247.4 15.8 307.8 References InSicht+Mira+QAP (opt.) 305.0 17.0 249.0 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 15 / 18
  • 16. A modified information retrieval approach to produce answer candidates for QA Evaluation results (3/3) Johannes Leveling IRSAW Results for MIRA answer candidates for QA@CLEF QA phases data from 2003 to 2006 MIRA Embedding of MIRA top-N Expected answer types TüBa-D/Z annotation N=50 N=30 N=10 N=5 MAVE Evaluation # Correct (2006) 798 615 215 95 Summary and # Inexact (2006) 56 53 20 12 Future Work # Wrong (2006) 4,436 3,421 1,360 722 References # Correct (2003–2006) 1,864 1,503 609 263 # Inexact (2003–2006) 287 248 103 54 # Wrong (2003–2006) 17,326 14,102 5,694 3,013 Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 16 / 18
  • 17. A modified information retrieval approach to produce answer candidates for QA Summary and Future Johannes Leveling Work IRSAW MIRA: QA phases MIRA • Produces a highly recall-oriented answer Embedding of MIRA Expected answer types stream, TüBa-D/Z annotation MAVE • Covers more questions than the other answer Evaluation producers in IRSAW, and Summary and Future Work • Returns the largest number of correct answer References candidates. Future work: • Return additional answer support for temporal deictic expressions • Support processing list questions Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 17 / 18
  • 18. A modified information retrieval approach to produce answer candidates for QA Selected References Johannes Leveling Glöckner, Ingo; Sven Hartrumpf; and Johannes Leveling IRSAW (2007). Logical validation, answer merging and witness QA phases selection – a case study in multi-stream question answering. In Proceedings of RIAO 2007, Large-Scale Semantic Access MIRA Embedding of MIRA to Content (Text, Image, Video and Sound). Pittsburgh, USA: Expected answer types TüBa-D/Z annotation C.I.D. MAVE Hartrumpf, Sven and Johannes Leveling (2007). Interpretation Evaluation and normalization of temporal expressions for question Summary and answering. In Evaluation of Multilingual and Multi-modal Future Work Information Retrieval: 7th Workshop of the Cross-Language References Evaluation Forum, CLEF 2006 (edited by et al., Carol Peters), volume 4730 of LNCS, pp. 432–439. Berlin: Springer. Leveling, Johannes (2006). On the role of information retrieval in the question answering system IRSAW. In Proceedings of the LWA 2006, Workshop Information Retrieval, pp. 119–125. Hildesheim, Germany: Universität Hildesheim. Johannes Leveling A modified information retrieval approach to produce answer candidates for QA 18 / 18