SlideShare una empresa de Scribd logo
1 de 30
August 13 2020
Sujit Pal, Elsevier Labs
Question Answering
as Search
The Anserini Pipeline and other stories
THE SEARCH RELEVANCE CONFERENCE
About Me
• Work at Elsevier Labs
• (Mostly self-taught) data scientist
• Ex-search guy, Lucene and Solr mainly
• Some NLP, traditional ML and Deep Learning,
some Computer Vision
• Started looking at Question Answering in 2019
• Specifically the BERTserini project from Jimmy
Lin’s lab.
2
Agenda
• Types of QA systems
• BERTSerini Pipeline
• Experiments and Results
3
Types of QA Systems
We will just cover the subset where the objective, given a question, is to get answer spans
from passages in a text corpus.
4
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
5
• Jurafsky and Martin, IBM Watson, YodaQA
• Choose keywords from question
• Predict Question type (who, what, when, …)
• Rank passage by answer type, question words
• Extract answer based on pattern matching and question type
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
6
• DrQA (2017), BERTserini (2019)
• Retriever is unsupervised
• Reader is supervised Reading Comprehension model
Reading Wikipedia to answer Open Domain Questions (Chen, et al, 2017)
End-to-end Open Domain Question Answering with BERTserini (Yang, et al, 2019)
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
7
• ORQA (2019), REALM (2020)
• Train retriever and reader end-to-end using question answer pairs.
• Answer ranked by vector similarity between learned embeddings (question and answer).
Latent Retrieval for Weakly Supervised Open Domain Question Answering (Lee, et al, 2019)
Retrieval Augmented Language Model Pre-training (Guu, et al, 2020)
Types of QA systems
• Traditional QA pipeline
• 2 stage Retriever Reader
systems
• Dense Retriever and Reader
• Language model based
8
• GPT-2, GPT-3, T5 (2019 - 2020)
• Fine tuned Language Model
• No corpus, LM stores world knowledge implicitly
Exploring the limits of Transfer Learning with a Unified Text-to-text Transformer (Raffel, et al, 2019)
The BERTSerini Pipeline
And how and why we adapted it for our needs
9
BERTserini Pipeline
10
Anserini + BERT = BERTserini
BERTserini Pipeline
11
SOTA Results!
How would these results
translate IRL?
Our BERTserini Pipeline
12
ScienceDirect
and later
ClinicalKey
Solr +
plugin
Best results with SciBERT/SQuAD 1.1
BERT Reader changes
• Switched out BERT-base model fine-tuned with SQuAD 1.1 with SciBERT
model fine-tuned with SQuAD v 1.1.
• Also tried…
− Fine tuning other BERT models – BERT-large, BioBERT.
− Fine tuning using SQuAD v 2.0 dataset
− Additional Pre-training model using Clinical Key content
13
Anserini Retriever changes
• Switched out Lucene index with Solr
based index.
• Moved batch oriented Anserini
functionality to Solr plugin for
interactive use.
− Open source, available in github:
https://github.com/elsevierlabs-
os/anserini-solr-plugin
− Code could be cleaner, but developed
for use in POC code.
14
anserini-solr-plugin
15
• Input: HTTP GET request specify
query, sim, qtype, and rtype.
• Similarity (sim): query likelihood
(ql) and BM25 (bm25, default).
• Query Rewriting (qtype)
• Bag of Words (bow), Sequential
Dependency Model (sdm)
• Added edismax and raw
• Result Reranking (rtype)
• RM3 (rm3)
• Axiomatic (ax)
• Identity (no reranking)
• Added external (delegate to
external rerank service)
• Output: HTTP Response
Rewritten query
Reranking query
https://github.com/elsevierlabs-os/anserini-solr-plugin
Experiments and Results
Creating the MedSQuAD dataset, and replacing the Anserini reranker component with
various candidates
16
Initial Setup
• Index paragraphs from ClinicalKey books
• Use BM25 + BOW + RM3
• Scoring:
• Use k=1, look at top answer only
• Scoring metric EM (exact match) and F1 (f1-score)
between label and predicted answers.
17
Paper says paragraphs and
these settings work best
We hope to use the top answer
for display without further post-
processing
SQuAD metrics
How well does BERTserini work on our data?
• 100 questions from nursing text,
classified as “Remembering” in
Bloom’s taxonomy.
• Run these questions against
pipeline and manually inspect
each answer.
• ~ 60 “reasonable” answers.
− Answer span correct, but…
− Passage answers question
18
What causes a condition known as black hairy
tongue?
Hairy tongue is a condition in which the patient has an
increased accumulation of keratin on the filiform papillae
that results in a white, “hairy” appearance. This may be
the result of either an increase in keratin production or a
decrease in normal desquamation. Unless otherwise
pigmented, the elongated filiform papillae are white (
Fig. 1.58). In the condition known as black hairy tongue,
the papillae are a brown-to-black color because
of chromogenic bacteria (Fig. 1.59). Tobacco and certain
foods may also discolor the papillae. Although the cause
is unknown, hydrogen peroxide, bismuth subsalicylates
for upset stomach, alcohol, or chemical rinses have
been suggested to stimulate the elongation of the filiform
papillae that results in the appearance of hairy tongue.
Oral Pathology for the Dental Hygienist: Introduction to Preliminary Diagnosis of
Oral Lesions (PII: B9780323400626000013, ISBN: 978-0-323-40062-6)
Some more good results
What is a cause of tooth mobility?
Periodontal probing is used to assess attachment
levels to the tooth and is a prime indicator of
health. Radiographic bone loss around a tooth
does not indicate the presence of a disease state
but is a reflection of past or present periodontal
disease. Occlusal trauma may cause an increase
in tooth mobility but does not cause marginal bone
loss in the absence of periodontal disease.
Contemporary Implant Dentistry: An Implant Is Not a Tooth: A Comparison
of Periodontal Indices (PII: B9780323043731500484, ISBN: 978-0-323-
04373-1)
19
What is the cause of stridor?
Stridor is a term used to describe a high-pitched sound
caused by partial obstruction of the airway. Stridor can
have an inspiratory, expiratory, or biphasic pattern (both
inspiratory and expiratory). An inspiratory pattern
suggests an upper airway cause (e.g., epiglottitis). An
expiratory pattern suggests a lower airway etiology
(e.g., tracheomalacia). A biphasic pattern suggests a
glottic or subglottic obstruction (e.g., subglottic
hemangioma). Imaging evaluation of the child with
stridor is commonly performed with neck and/or chest
radiographs, depending on the pattern of stridor and
associated clinical findings.
Emergency Radiology: The Requisites: Imaging Evaluation of Common Pediatric
Emergencies (PII: B9780323376402000066, ISBN: 978-0-323-37640-2)
As well as some fails
What special considerations must be
observed when a patient has epiglottitis?
What special considerations related to
her transplant need to be in place for this
patient during critical care resuscitation?
Advanced Critical Care Nursing: Bone Marrow Transplantation (PII:
B9781416032199100397, ISBN: 978-1-4160-3219-9)
20
What conditions are treated by methotrexate?
The combination of PUVA and methotrexate
successfully treated five patients
with erythrodermic psoriasis and two with
pustular psoriasis. According to the authors,
annual methotrexate doses could be reduced by
50% by adding PUVA to the regimen.
Treatment of Skin Disease: Comprehensive Therapeutic Strategies:
Psoriasis (PII: B978070206912300210X, ISBN: 978-0-7020-6912-3)Meaningless answer
Surely a better answer exists?
Reader Experiments
• Results of evaluating various Reader configurations (no Retriever)
against SQuAD dataset.
• Encouraging results for reading comprehension task, i.e., when
appropriate passage is provided.
21
Parameters EM F1
BERT-base uncased + SQuAD 1.1 75.86 82.41
BERT-base uncased + SQuAD 2.0 74.03 77.30
SciBERT + SQuAD 1.1 79.10 87.26
Human (SQuAD v2) 86.83 89.45
MedSQuAD dataset
• SQuAD contains (question, passage, answer)
triples.
− Task is Reading Comprehension, i.e., find the
most appropriate span in the passage to return
as an answer to the question.
• Nursing content = (question, answer) pairs.
• MedSQUAD dataset
− Good answers from nursing questions + top
passages, select best passage manually
− Passages in ClinicalKey + automatic question
generation, select triples manually
− Approximately 300 (question, passage, answer)
triples.
22
Retriever Experiments
• Adding default retriever backend
− Parse the question into appropriate query (BM25 + BoW worked best)
− Rerank (RM3 worked best) and return top 50 result passages
− Reader generates answer using each of the top 50 passages
− Returns the top (k=1) answer by segment and span score
• Scores drop by 40+ points!
23
Reader not getting
the “right”
passages?
Parameters EM F1
Baseline (no retriever) 65.11 76.03
Anserini retriever (BM25+Bow+RM3, 50 results, k=1) 23.02 30.50
Passage reranking? (Nogueira
and Cho, 2020)
Passage Reranking with BERT (Nogueira and Cho, 2020)
BERT Based Reranker (Unsupervised)
• BERT-as-a-service (BaaS) wraps a BERT-base-uncased model and
returns embeddings from its last layer [-1].
• Query embeddings produced from query using BaaS.
• Passage embeddings produced for top 50 passages returned by query
using BaaS.
• Cosine similarity computed between query vector and passage vectors
and passages reranked by similarity descending.
24
Parameters EM F1
BM25+BoW+RM3, 50 records 8.27 11.45
Query Sentence Relevance Reranker
• Model predicts relevance (0/1) between query and single sentence.
• Trained on TREC Microblog dataset (120,000 query sentence pairs)
• Classifier fine-tuned from BERT-base-uncased for 2 epochs, Adam
optimizer with learning rate 2e-5, and batch size 32, F1-score: 0.86.
• Passage is scored as #-relevant sentences / #-number of sentences and
reranked by score descending.
25
Parameters EM F1
BM25+BoW+RM3, 100 records 13.35 19.19
Passage Relevance Reranker
• Model predicts relevance (number between 0 and
1) between passage and question.
• Trained using SQuAD 1.1 (passage, question) pairs
with negative sampling.
• Regression model fine-tuned from BERT-base-
uncased for 2 epochs, batch size 8, Adam
optimizer with learning rate 2e-5, RMSE 0.3.
• Passage is ranked by relevance score descending.
26
Parameters EM F1
BM25+BoW+RM3, 50 records 8.99 15.69
Siamese BERT Reranker
• Pretrained models from https://github.com/UKPLab/sentence-transformers
to produce embeddings from question and passage text.
• Passage score is mean or maximum similarity between question vector
and all sentences in passage.
• Passages ranked by score descending.
27
Parameters EM F1
BM25+BoW+RM3, 50 records, max similarity,
model: bert_base_nli_mean_tokens.
16.91 24.95
Reranker Scores (all together)
28
Parameters EM F1
BERTserini (Paragraph, k=100) 38.6 46.1
BERTserini (Paragraph, k=29) 36.6 44.0
BERTserini (Article, k=5) 19.1 25.9
Anserini (BM25+BoW+RM3, SciBERT, MedSQuAD,
Paragraph, k=1) (baseline)
23.02 30.5
RM3 replaced with BERT reranker 8.27 11.45
RM3 replaced with Query Sentence Relevance reranker 13.35 19.19
RM3 replaced with Passage Relevance reranker 8.99 15.69
RM3 replaced with Siamese BERT reranker 16.91 24.95
From paper
Our results
Conclusions
• We couldn’t beat Anserini with any of our Passage Rerankers.
• Our results are comparable (paragraph, k=1) to those reported in the
paper.
• However, response time is unacceptable because of slow Reader
component (techniques such as model distillation might help somewhat).
• Quality of answer snippet at k=1 not always acceptable, and too risky for
production use.
• We also want to use the selected passage and the source metadata as
additional provenance for the answer
29
Thank you
I am @sujitpal on relevancy.slack.com if you have
questions

Más contenido relacionado

La actualidad más candente

Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
Ali Habeeb
 

La actualidad más candente (20)

Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information Retrieval
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Text summarization
Text summarizationText summarization
Text summarization
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
gpt3_presentation.pdf
gpt3_presentation.pdfgpt3_presentation.pdf
gpt3_presentation.pdf
 
BERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil KumarBERT - Part 1 Learning Notes of Senthil Kumar
BERT - Part 1 Learning Notes of Senthil Kumar
 
Medical data management: COVID-19 detection using cough recordings, chest X-...
Medical data management: COVID-19 detection using cough recordings,  chest X-...Medical data management: COVID-19 detection using cough recordings,  chest X-...
Medical data management: COVID-19 detection using cough recordings, chest X-...
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
Generating Qualitative Content with GPT-2 in All Languages
Generating Qualitative Content with GPT-2 in All LanguagesGenerating Qualitative Content with GPT-2 in All Languages
Generating Qualitative Content with GPT-2 in All Languages
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Word embedding
Word embedding Word embedding
Word embedding
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 

Similar a Question Answering as Search - the Anserini Pipeline and Other Stories

LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
DIPESH30
 
Weakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using SnorkelWeakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using Snorkel
Anjani Dhrangadhariya
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
Josh Patterson
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
smumbahelp
 
chapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for collegechapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for college
joygalero
 

Similar a Question Answering as Search - the Anserini Pipeline and Other Stories (20)

LamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docxLamiaFinal data ( results).docx1- label all lanes, label ma.docx
LamiaFinal data ( results).docx1- label all lanes, label ma.docx
 
44publicspkeaking06
44publicspkeaking0644publicspkeaking06
44publicspkeaking06
 
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth ContextRuleML 2015: Ontology Reasoning using Rules in an eHealth Context
RuleML 2015: Ontology Reasoning using Rules in an eHealth Context
 
Umap v1
Umap v1Umap v1
Umap v1
 
Blueprinting and drafting questions liz norman anzcvs 2015
Blueprinting and drafting questions liz norman anzcvs 2015Blueprinting and drafting questions liz norman anzcvs 2015
Blueprinting and drafting questions liz norman anzcvs 2015
 
Weakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using SnorkelWeakly supervised PICO information extraction using Snorkel
Weakly supervised PICO information extraction using Snorkel
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
“Towards Multi-Step Expert Advice for Cognitive Computing” - Dr. Achim Rettin...
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Liz Norman Examination and moderation guidelines
Liz Norman   Examination and moderation guidelinesLiz Norman   Examination and moderation guidelines
Liz Norman Examination and moderation guidelines
 
Modeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural NetworksModeling Electronic Health Records with Recurrent Neural Networks
Modeling Electronic Health Records with Recurrent Neural Networks
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence Intervals
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Publish or perish
Publish or perishPublish or perish
Publish or perish
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
chapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for collegechapter-00-01.ppt analytical chemistry for college
chapter-00-01.ppt analytical chemistry for college
 

Más de Sujit Pal

Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
Sujit Pal
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
Sujit Pal
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
Sujit Pal
 

Más de Sujit Pal (20)

Supporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge GraphSupporting Concept Search using a Clinical Healthcare Knowledge Graph
Supporting Concept Search using a Clinical Healthcare Knowledge Graph
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...Building Learning to Rank (LTR) search reranking models using Large Language ...
Building Learning to Rank (LTR) search reranking models using Large Language ...
 
Cheap Trick for Question Answering
Cheap Trick for Question AnsweringCheap Trick for Question Answering
Cheap Trick for Question Answering
 
Searching Across Images and Test
Searching Across Images and TestSearching Across Images and Test
Searching Across Images and Test
 
Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...Learning a Joint Embedding Representation for Image Search using Self-supervi...
Learning a Joint Embedding Representation for Image Search using Self-supervi...
 
The power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestringThe power of community: training a Transformer Language Model on a shoestring
The power of community: training a Transformer Language Model on a shoestring
 
Backprop Visualization
Backprop VisualizationBackprop Visualization
Backprop Visualization
 
Accelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn CloudAccelerating NLP with Dask and Saturn Cloud
Accelerating NLP with Dask and Saturn Cloud
 
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
Accelerating NLP with Dask on Saturn Cloud: A case study with CORD-19
 
Leslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal ClubLeslie Smith's Papers discussion for DL Journal Club
Leslie Smith's Papers discussion for DL Journal Club
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
 
Transformer Mods for Document Length Inputs
Transformer Mods for Document Length InputsTransformer Mods for Document Length Inputs
Transformer Mods for Document Length Inputs
 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Search summit-2018-ltr-presentation
Search summit-2018-ltr-presentationSearch summit-2018-ltr-presentation
Search summit-2018-ltr-presentation
 
Search summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slidesSearch summit-2018-content-engineering-slides
Search summit-2018-content-engineering-slides
 
SoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming textSoDA v2 - Named Entity Recognition from streaming text
SoDA v2 - Named Entity Recognition from streaming text
 
Evolving a Medical Image Similarity Search
Evolving a Medical Image Similarity SearchEvolving a Medical Image Similarity Search
Evolving a Medical Image Similarity Search
 

Último

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Question Answering as Search - the Anserini Pipeline and Other Stories

  • 1. August 13 2020 Sujit Pal, Elsevier Labs Question Answering as Search The Anserini Pipeline and other stories THE SEARCH RELEVANCE CONFERENCE
  • 2. About Me • Work at Elsevier Labs • (Mostly self-taught) data scientist • Ex-search guy, Lucene and Solr mainly • Some NLP, traditional ML and Deep Learning, some Computer Vision • Started looking at Question Answering in 2019 • Specifically the BERTserini project from Jimmy Lin’s lab. 2
  • 3. Agenda • Types of QA systems • BERTSerini Pipeline • Experiments and Results 3
  • 4. Types of QA Systems We will just cover the subset where the objective, given a question, is to get answer spans from passages in a text corpus. 4
  • 5. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 5 • Jurafsky and Martin, IBM Watson, YodaQA • Choose keywords from question • Predict Question type (who, what, when, …) • Rank passage by answer type, question words • Extract answer based on pattern matching and question type
  • 6. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 6 • DrQA (2017), BERTserini (2019) • Retriever is unsupervised • Reader is supervised Reading Comprehension model Reading Wikipedia to answer Open Domain Questions (Chen, et al, 2017) End-to-end Open Domain Question Answering with BERTserini (Yang, et al, 2019)
  • 7. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 7 • ORQA (2019), REALM (2020) • Train retriever and reader end-to-end using question answer pairs. • Answer ranked by vector similarity between learned embeddings (question and answer). Latent Retrieval for Weakly Supervised Open Domain Question Answering (Lee, et al, 2019) Retrieval Augmented Language Model Pre-training (Guu, et al, 2020)
  • 8. Types of QA systems • Traditional QA pipeline • 2 stage Retriever Reader systems • Dense Retriever and Reader • Language model based 8 • GPT-2, GPT-3, T5 (2019 - 2020) • Fine tuned Language Model • No corpus, LM stores world knowledge implicitly Exploring the limits of Transfer Learning with a Unified Text-to-text Transformer (Raffel, et al, 2019)
  • 9. The BERTSerini Pipeline And how and why we adapted it for our needs 9
  • 11. BERTserini Pipeline 11 SOTA Results! How would these results translate IRL?
  • 12. Our BERTserini Pipeline 12 ScienceDirect and later ClinicalKey Solr + plugin Best results with SciBERT/SQuAD 1.1
  • 13. BERT Reader changes • Switched out BERT-base model fine-tuned with SQuAD 1.1 with SciBERT model fine-tuned with SQuAD v 1.1. • Also tried… − Fine tuning other BERT models – BERT-large, BioBERT. − Fine tuning using SQuAD v 2.0 dataset − Additional Pre-training model using Clinical Key content 13
  • 14. Anserini Retriever changes • Switched out Lucene index with Solr based index. • Moved batch oriented Anserini functionality to Solr plugin for interactive use. − Open source, available in github: https://github.com/elsevierlabs- os/anserini-solr-plugin − Code could be cleaner, but developed for use in POC code. 14
  • 15. anserini-solr-plugin 15 • Input: HTTP GET request specify query, sim, qtype, and rtype. • Similarity (sim): query likelihood (ql) and BM25 (bm25, default). • Query Rewriting (qtype) • Bag of Words (bow), Sequential Dependency Model (sdm) • Added edismax and raw • Result Reranking (rtype) • RM3 (rm3) • Axiomatic (ax) • Identity (no reranking) • Added external (delegate to external rerank service) • Output: HTTP Response Rewritten query Reranking query https://github.com/elsevierlabs-os/anserini-solr-plugin
  • 16. Experiments and Results Creating the MedSQuAD dataset, and replacing the Anserini reranker component with various candidates 16
  • 17. Initial Setup • Index paragraphs from ClinicalKey books • Use BM25 + BOW + RM3 • Scoring: • Use k=1, look at top answer only • Scoring metric EM (exact match) and F1 (f1-score) between label and predicted answers. 17 Paper says paragraphs and these settings work best We hope to use the top answer for display without further post- processing SQuAD metrics
  • 18. How well does BERTserini work on our data? • 100 questions from nursing text, classified as “Remembering” in Bloom’s taxonomy. • Run these questions against pipeline and manually inspect each answer. • ~ 60 “reasonable” answers. − Answer span correct, but… − Passage answers question 18 What causes a condition known as black hairy tongue? Hairy tongue is a condition in which the patient has an increased accumulation of keratin on the filiform papillae that results in a white, “hairy” appearance. This may be the result of either an increase in keratin production or a decrease in normal desquamation. Unless otherwise pigmented, the elongated filiform papillae are white ( Fig. 1.58). In the condition known as black hairy tongue, the papillae are a brown-to-black color because of chromogenic bacteria (Fig. 1.59). Tobacco and certain foods may also discolor the papillae. Although the cause is unknown, hydrogen peroxide, bismuth subsalicylates for upset stomach, alcohol, or chemical rinses have been suggested to stimulate the elongation of the filiform papillae that results in the appearance of hairy tongue. Oral Pathology for the Dental Hygienist: Introduction to Preliminary Diagnosis of Oral Lesions (PII: B9780323400626000013, ISBN: 978-0-323-40062-6)
  • 19. Some more good results What is a cause of tooth mobility? Periodontal probing is used to assess attachment levels to the tooth and is a prime indicator of health. Radiographic bone loss around a tooth does not indicate the presence of a disease state but is a reflection of past or present periodontal disease. Occlusal trauma may cause an increase in tooth mobility but does not cause marginal bone loss in the absence of periodontal disease. Contemporary Implant Dentistry: An Implant Is Not a Tooth: A Comparison of Periodontal Indices (PII: B9780323043731500484, ISBN: 978-0-323- 04373-1) 19 What is the cause of stridor? Stridor is a term used to describe a high-pitched sound caused by partial obstruction of the airway. Stridor can have an inspiratory, expiratory, or biphasic pattern (both inspiratory and expiratory). An inspiratory pattern suggests an upper airway cause (e.g., epiglottitis). An expiratory pattern suggests a lower airway etiology (e.g., tracheomalacia). A biphasic pattern suggests a glottic or subglottic obstruction (e.g., subglottic hemangioma). Imaging evaluation of the child with stridor is commonly performed with neck and/or chest radiographs, depending on the pattern of stridor and associated clinical findings. Emergency Radiology: The Requisites: Imaging Evaluation of Common Pediatric Emergencies (PII: B9780323376402000066, ISBN: 978-0-323-37640-2)
  • 20. As well as some fails What special considerations must be observed when a patient has epiglottitis? What special considerations related to her transplant need to be in place for this patient during critical care resuscitation? Advanced Critical Care Nursing: Bone Marrow Transplantation (PII: B9781416032199100397, ISBN: 978-1-4160-3219-9) 20 What conditions are treated by methotrexate? The combination of PUVA and methotrexate successfully treated five patients with erythrodermic psoriasis and two with pustular psoriasis. According to the authors, annual methotrexate doses could be reduced by 50% by adding PUVA to the regimen. Treatment of Skin Disease: Comprehensive Therapeutic Strategies: Psoriasis (PII: B978070206912300210X, ISBN: 978-0-7020-6912-3)Meaningless answer Surely a better answer exists?
  • 21. Reader Experiments • Results of evaluating various Reader configurations (no Retriever) against SQuAD dataset. • Encouraging results for reading comprehension task, i.e., when appropriate passage is provided. 21 Parameters EM F1 BERT-base uncased + SQuAD 1.1 75.86 82.41 BERT-base uncased + SQuAD 2.0 74.03 77.30 SciBERT + SQuAD 1.1 79.10 87.26 Human (SQuAD v2) 86.83 89.45
  • 22. MedSQuAD dataset • SQuAD contains (question, passage, answer) triples. − Task is Reading Comprehension, i.e., find the most appropriate span in the passage to return as an answer to the question. • Nursing content = (question, answer) pairs. • MedSQUAD dataset − Good answers from nursing questions + top passages, select best passage manually − Passages in ClinicalKey + automatic question generation, select triples manually − Approximately 300 (question, passage, answer) triples. 22
  • 23. Retriever Experiments • Adding default retriever backend − Parse the question into appropriate query (BM25 + BoW worked best) − Rerank (RM3 worked best) and return top 50 result passages − Reader generates answer using each of the top 50 passages − Returns the top (k=1) answer by segment and span score • Scores drop by 40+ points! 23 Reader not getting the “right” passages? Parameters EM F1 Baseline (no retriever) 65.11 76.03 Anserini retriever (BM25+Bow+RM3, 50 results, k=1) 23.02 30.50 Passage reranking? (Nogueira and Cho, 2020) Passage Reranking with BERT (Nogueira and Cho, 2020)
  • 24. BERT Based Reranker (Unsupervised) • BERT-as-a-service (BaaS) wraps a BERT-base-uncased model and returns embeddings from its last layer [-1]. • Query embeddings produced from query using BaaS. • Passage embeddings produced for top 50 passages returned by query using BaaS. • Cosine similarity computed between query vector and passage vectors and passages reranked by similarity descending. 24 Parameters EM F1 BM25+BoW+RM3, 50 records 8.27 11.45
  • 25. Query Sentence Relevance Reranker • Model predicts relevance (0/1) between query and single sentence. • Trained on TREC Microblog dataset (120,000 query sentence pairs) • Classifier fine-tuned from BERT-base-uncased for 2 epochs, Adam optimizer with learning rate 2e-5, and batch size 32, F1-score: 0.86. • Passage is scored as #-relevant sentences / #-number of sentences and reranked by score descending. 25 Parameters EM F1 BM25+BoW+RM3, 100 records 13.35 19.19
  • 26. Passage Relevance Reranker • Model predicts relevance (number between 0 and 1) between passage and question. • Trained using SQuAD 1.1 (passage, question) pairs with negative sampling. • Regression model fine-tuned from BERT-base- uncased for 2 epochs, batch size 8, Adam optimizer with learning rate 2e-5, RMSE 0.3. • Passage is ranked by relevance score descending. 26 Parameters EM F1 BM25+BoW+RM3, 50 records 8.99 15.69
  • 27. Siamese BERT Reranker • Pretrained models from https://github.com/UKPLab/sentence-transformers to produce embeddings from question and passage text. • Passage score is mean or maximum similarity between question vector and all sentences in passage. • Passages ranked by score descending. 27 Parameters EM F1 BM25+BoW+RM3, 50 records, max similarity, model: bert_base_nli_mean_tokens. 16.91 24.95
  • 28. Reranker Scores (all together) 28 Parameters EM F1 BERTserini (Paragraph, k=100) 38.6 46.1 BERTserini (Paragraph, k=29) 36.6 44.0 BERTserini (Article, k=5) 19.1 25.9 Anserini (BM25+BoW+RM3, SciBERT, MedSQuAD, Paragraph, k=1) (baseline) 23.02 30.5 RM3 replaced with BERT reranker 8.27 11.45 RM3 replaced with Query Sentence Relevance reranker 13.35 19.19 RM3 replaced with Passage Relevance reranker 8.99 15.69 RM3 replaced with Siamese BERT reranker 16.91 24.95 From paper Our results
  • 29. Conclusions • We couldn’t beat Anserini with any of our Passage Rerankers. • Our results are comparable (paragraph, k=1) to those reported in the paper. • However, response time is unacceptable because of slow Reader component (techniques such as model distillation might help somewhat). • Quality of answer snippet at k=1 not always acceptable, and too risky for production use. • We also want to use the selected passage and the source metadata as additional provenance for the answer 29
  • 30. Thank you I am @sujitpal on relevancy.slack.com if you have questions