Automated Identification of Similar Health Questions

•

1 recomendación•275 vistas

Geoffrey Rutledge

Presentation to the AMIA 2013 Summit on Clinical Research Informatics, March 20, 2013

Salud y medicina

Automated Identification of Similar Health Questions
Geoffrey W. Rutledge MD, PhD
Chief Medical Information Officer
HealthTap.com

Introduction Discussion
People with health questions are increasingly The three similarity criteria tested are The problem of identifying semantically
looking for physician answers to their health 1. Lexical identity after removal of all non- similar health questions is complicated by the
questions online. Given the repetitive nature alphanumeric characters variability of consumer health language and
of common questions, there is a high value in 2. Sum of semantic weights of all matching the difficulties that consumers have in
identifying previously answered questions health concepts spelling medical terms. A comprehensive
that are semantically similar (or identical) to 3. Sum of weights of only the moderate or ontology and synonym set of consumer
each new question, so that an answer can be high weight matching health concepts health terms enabled the accurate detection
given without delay and without waiting for a of a large fraction of semantically similar
new answer from a physician. consumer health questions that were entered
1

(2)

in an online health site.
Background 0.9

True
positive
rate

(3)

0.8

Previous methods to evaluate question pairs 0.7

The automated identification of similar
were based on sentence similarity [1,2] and 0.6

0.5

consumer health questions is challenging
are not suitable for consumer health 0.4

(1)

because of the common occurrence of
questions, which contain many consumer- 0.3

0.2

complex, colloquial, and often misspelled
health variations and frequent misspellings of 0.1
medical terms in consumer health questions.
0

medical concepts. We developed a method 0
0.1
0.2
0.3
0.4
We collected online health questions and
to identify questions with “high semantic False
positive
rate

their paired "nearest search result" matching
similarity” from a corpus of consumer health questions to evaluate 3 question similarity
questions and answers, in which the metrics. The best performing metric was
Examples of medical concepts:
questions and answers are character limited moderate weights: antibiotics, heart disease,
based on the sum of semantic weights for all
to 150 and 400 characters respectively. sharp pain matching health concepts from a
high weights: penicillin, congestive heart failure, comprehensive ontology of consumer health
Method squeezing chest pain terms and common misspellings, with a
We compare the text of new questions to the measured sensitivity of 0.61 and specificity of
closest matching question from the Q&A 0.99.
corpus. For a set of 1,000 questions and their Results
closest match, we evaluated the sensitivity We compared the three similarity criteria [1] The Evaluation of Sentence Similarity Measures, I.-
and specificity of alternative similarity criteria Y. Song, J. Eder, and T.M. Nguyen (Eds.): DaWaK
against an expert assessment of question
2008, LNCS 5182, pp. 305–316, 2008.
for the assertion of “high semantic similarity.” pair similarity. The sensitivities and [2] Finding Similar Questions in Large Question and
We first identified the most similar question specificities for the three criteria are (1) 0.47, Answer Archives. Jiwoon Jeon, W. Bruce Croft and
within the Q&A corpus using a search engine 1 (2) 0.61, 0.99 (3) 0.63, 0.97, as plotted on Joon Ho Lee. CIKM’05, October 31–November 5, 2005
augmented with a semantic-weight driven the chart of False positive versus True
ontology of consumer health concepts, which positive rates (ROC). The criterion with the
includes a rich set of synonyms of consumer best performance was Sum of semantic
health terms, and frequent misspellings of weights of all matching concepts. We are hiring
consumer health terms. geoff@healthtap.com

Más contenido relacionado

Último

systemic bacteriology (7)............pptxEyobAlemu11

Case Report Peripartum Cardiomyopathy.pptxNiranjan Chavan

Basic principles involved in the traditional systems of medicine PDF.pdfDivya Kanojiya

April 2024 ONCOLOGY CARTOON by DR KANHU CHARAN PATROKanhu Charan

Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Badalona Serveis Assistencials

low cost antibiotic cement nail for infected non union.pptxdrashraf369

PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfDolisha Warbi

Wessex Health Partners Wessex Integrated Care, Population Health, Research & ...Wessex Health Partners

PNEUMOTHORAX AND ITS MANAGEMENTS.pdfDolisha Warbi

CEHPALOSPORINS.pptx By Harshvardhan Dev Bhoomi Uttarakhand UniversityHarshChauhan475104

Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics

Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq

PERFECT BUT PAINFUL TKR -ROLE OF SYNOVECTOMY.pptxdrashraf369

Clinical Pharmacotherapy of Scabies DiseaseSreenivasa Reddy Thalla

SGK HÓA SINH NĂNG LƯỢNG SINH HỌC 2006.pdfHongBiThi1

Informed Consent Empowering Healthcare Decision-Making.pptxSasikiranMarri

EpilepsyArunagiri17

Lippincott Microcards_ Microbiology Flash Cards-LWW (2015).pdfSreeja Cherukuru

Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranTara Rajendran

Primary headache and facial pain. (2024)Mohamed Rizk Khodair

Destacado

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Barbie - Brand Strategy PresentationErica Santiago

Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software

Destacado (20)

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Barbie - Brand Strategy Presentation

Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well

Automated Identification of Similar Health Questions

1. Automated Identification of Similar Health Questions Geoffrey W. Rutledge MD, PhD Chief Medical Information Officer HealthTap.com Introduction Discussion People with health questions are increasingly The three similarity criteria tested are The problem of identifying semantically looking for physician answers to their health 1. Lexical identity after removal of all non- similar health questions is complicated by the questions online. Given the repetitive nature alphanumeric characters variability of consumer health language and of common questions, there is a high value in 2. Sum of semantic weights of all matching the difficulties that consumers have in identifying previously answered questions health concepts spelling medical terms. A comprehensive that are semantically similar (or identical) to 3. Sum of weights of only the moderate or ontology and synonym set of consumer each new question, so that an answer can be high weight matching health concepts health terms enabled the accurate detection given without delay and without waiting for a of a large fraction of semantically similar new answer from a physician. consumer health questions that were entered 1 (2) in an online health site. Background 0.9 True positive rate (3) 0.8 Previous methods to evaluate question pairs 0.7 The automated identification of similar were based on sentence similarity [1,2] and 0.6 0.5 consumer health questions is challenging are not suitable for consumer health 0.4 (1) because of the common occurrence of questions, which contain many consumer- 0.3 0.2 complex, colloquial, and often misspelled health variations and frequent misspellings of 0.1 medical terms in consumer health questions. 0 medical concepts. We developed a method 0 0.1 0.2 0.3 0.4 We collected online health questions and to identify questions with “high semantic False positive rate their paired "nearest search result" matching similarity” from a corpus of consumer health questions to evaluate 3 question similarity questions and answers, in which the metrics. The best performing metric was Examples of medical concepts: questions and answers are character limited moderate weights: antibiotics, heart disease, based on the sum of semantic weights for all to 150 and 400 characters respectively. sharp pain matching health concepts from a high weights: penicillin, congestive heart failure, comprehensive ontology of consumer health Method squeezing chest pain terms and common misspellings, with a We compare the text of new questions to the measured sensitivity of 0.61 and specificity of closest matching question from the Q&A 0.99. corpus. For a set of 1,000 questions and their Results closest match, we evaluated the sensitivity We compared the three similarity criteria [1] The Evaluation of Sentence Similarity Measures, I.- and specificity of alternative similarity criteria Y. Song, J. Eder, and T.M. Nguyen (Eds.): DaWaK against an expert assessment of question 2008, LNCS 5182, pp. 305–316, 2008. for the assertion of “high semantic similarity.” pair similarity. The sensitivities and [2] Finding Similar Questions in Large Question and We first identified the most similar question specificities for the three criteria are (1) 0.47, Answer Archives. Jiwoon Jeon, W. Bruce Croft and within the Q&A corpus using a search engine 1 (2) 0.61, 0.99 (3) 0.63, 0.97, as plotted on Joon Ho Lee. CIKM’05, October 31–November 5, 2005 augmented with a semantic-weight driven the chart of False positive versus True ontology of consumer health concepts, which positive rates (ROC). The criterion with the includes a rich set of synonyms of consumer best performance was Sum of semantic health terms, and frequent misspellings of weights of all matching concepts. We are hiring consumer health terms. geoff@healthtap.com

Automated Identification of Similar Health Questions

Recomendados

Recomendados

Más contenido relacionado

Último

Último (20)

Destacado

Destacado (20)

Automated Identification of Similar Health Questions