SlideShare a Scribd company logo
1 of 23
First Steps Towards a Risk of Bias Corpus
of Randomized Controlled Trials
Presenter – Anjani Dhrangadhariya
MIE2023 - Göteborg, Sweden, 23.05.23
Authors: Anjani Dhrangadhariya, Roger Hilfiker, Martin Sattelmayer, Katia
Giacomino, Rahel Caliesch, Simone Elsig, Nona Naderi, Henning Müller
Randomized Controlled Trial
• In theory, an RCT accurately measures intervention effects on patient
outcomes, but in practice, biases enter
• Design/Planning
• Execution
• Analysis
• Outcomes reporting
• Systematic Reviews
• Utility
• Medical professionals
• Health policies
• Surgeons
• The risk of bias specifically pertains to systematic errors in the design,
conduct, or reporting of a study that can potentially lead to a
deviation from the true effect being measured.
• RoB assessment guidelines
Risk of Bias (RoB)
Example RoB assessment guidelines Year
Physiotherapy Evidence Database (PEDro) 1999
Risk of Bias Assessment Tool for Nonrandomized Studies (RoBANS) 2004
Cochrane Risk of Bias assessment guidelines 2008
Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) 2016
Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) 2017
Newcastle-Ottawa Scale (NOS) 2018
Revised Cochrane Risk of Bias for RCTs 2.0 tool (RoB 2) 2019
RoB information extraction
• Thorough assessment
• Manual assessment
• Time-consuming
• Cognitively demanding
• Two experts for manual assessment
• Third, for conflict resolution
• Automation imperative
Related Work
• RoB labelled corpus
• Wang et al. 2022
• Preclinical animal
studies
• Human RCTs
• RobotReviewer
• PDF highlights
• Freely-available
• Closed assess data
• Cochrane RoB v1
• RoB 2.0?
• RoB automation
• Marshall et al. 2015
• Millard et al. 2016
• Cochrane Database
(CDSR)
• Closed access
Motivation
1
No RoB text annotation
guidelines exist
2
No RoB annotated RCTs
exist
Revised Cochrane RoB 2.0 tool
• Can you use the guidelines to
annotate text corpus?
• Extensive guidelines
• Step-by-step instructions
• Divides RoB into 5 domains
• Each domain is assessed using several
signalling questions
Randomization
process
Deviations from
intended
interventions
Missing
outcomes data
Outcomes
measurement
Selection of
reported result
Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019. RoB 2: a
revised tool for assessing risk of bias in randomised trials. bmj, 366.
Revised Cochrane RoB 2.0 tool
• Reviewers manually go through the RCT to identify text describing the
answer to a signalling question.
• Based on the answer to the signalling question, select one of the five
response judgements:
Yes Probably Yes Probably No No No Information
Revised Cochrane RoB 2.0 tool
• 2.1 - Were the participants aware of their assigned intervention
during the trial?
2.1 No Good
Risk domains Signalling questions
5 22
Annotation schema
• Follow the revised Cochrane RoB 2.0
• 110 span Labels
• 1.1 Yes Good
• 1.1 Probably Yes Good
• 1.1 Probably No Bad
• 1.1 No bad
• 1.1 No Information
• 1.2 Yes Good
• 1.2 Probably Yes Good
• 1.2 Probably No Bad
• …
1.1 Yes Good
Risk domain
Signalling question
SQ response
Direction
Good = low risk
Bad = High risk
Pilot Annotation
• Ten RCT full-text PDFs
• 2000-2019
• Four annotators
• 2 scientists
• 1 doctoral student
• 1 scientific collaborator
• Two NLP experts
• 1 professor
• 1 doctoral student
• tagtog PDF annotation tool
https://www.tagtog.com/
Evaluation
• F1-measure as Inter-annotator agreement
• Disregards out-of-the-span tokens (unannotated tokens)
1. IAASQ
Do the annotator pairs annotate
the same text span to answer a
signalling question (SQ)?
2. IAAresponse
If the annotator pairs annotate
the same text to answer a
signalling question, do they also
select same response
judgment?
Results - IAASQ
• Zero or no Annotation
• Domain 2 - 52%
• Domain 3 - 54%
• Domain 4 - 50%
• Domain 5 - 61% (protocol)
• Less subjective questions
• Better IAA
The table details the interpretation of pairwise F1-measure.
Results - IAAresponse
• IAA - SQ response judgment
• Averaged over all annotator pairs
• Zero agreement - 52.63%
• No annotation – 22%
~75%
The table details the interpretation of pairwise F1-measure.
Error Inspection – 1. Text span disagreement
• Not limiting the annotators to
annotating
• phrases vs full sentences
4.1 Was the method of measuring the outcome
inappropriate?
…The primary outcome measure was a 0–10
NRS pain score, which reflected the average
pain experienced by the patient for ten days
prior to follow-up…
…a 0–10 NRS pain score…
Phrase!
Sentence
Error Inspection – 2. Different sections
• Annotators use different regions
(Methods section, Results section,
Table, …) of full text to come to
identical labels.
• Same judgment, different parts of
text evidence
2.6 Was an appropriate analysis used to estimate
the effect of assignment to intervention?
…This study was guided by the HAPA, which
has been widely used to address the gap
between intention to change and a person’s
actual change in behaviour [25-27]…
…intention-to-treat analysis was done with
missing data substituted by the last-
observation-carried-forward procedure…
2.1 Yes Good
Error Inspection – 3. Polarity disagreement
… 71 allocated routine services, 67 allocated
intervention service, 69 assessed at 8 weeks,
64 assessed at 8 week...
3.1 Were data for the outcome of interest
available for all, or nearly all, participants
randomized?
• Selecting response judgment
options with different polarities
• Yes vs. No
• Three of the four annotators
responded to 3.1 with Yes, but
one chose Probably no.
• All or nearly all (cut-off?)
Error Inspection – 4. Degree disagreement
• Lenient - definitive
• Yes
• No
• Stringent
• Probably yes
• Probably no
1.1 Was a random sequence generation
method used to assign participants to
intervention groups?
…Patients were randomly allocated to either
intervention by a computer-generated
schedule stratified by sex and attendance at
a day hospital…
Conclusions
1. RoB 2.0 assessment guidelines cannot be directly used as RoB
corpus annotation guidelines.
2. RoB assessment and RoB text annotation tasks are both highly
subjective, but the annotation guidelines can be refined with an
iterative process to improve both.
Future Directions
1. Instructional placards as
annotation guidelines
2. Larger annotated corpus
of RCTs
Dr. Roger Hilfiker
Dr. Martin Sattelmayer
Rahel Caliesch
Katia Giacomino
Dr. Nona Naderi
Annotation team
References
1. Wang, Q., Liao, J., Lapata, M., & Macleod, M. (2022). Risk of bias assessment in preclinical literature using natural language processing. Research Synthesis
Methods, 13(3), 368-380.
2. Macleod, M. R., O’Collins, T., Howells, D. W., & Donnan, G. A. (2004). Pooling of animal experimental data reveals influence of study design and publication
bias. Stroke, 35(5), 1203-1208.
3. Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, Kouril M, Marsolo K, Solti I. Building gold standard corpora for medical natural language processing tasks. InAMIA
Annual Symposium Proceedings 2012 (Vol. 2012, p. 144). American Medical Informatics Association.
4. Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019.
RoB 2: a revised tool for assessing risk of bias in randomised trials. bmj, 366.
Thank You
Questions?
Dataset: https://zenodo.org/record/7698941#.ZEGhXexBzzU
Email: anjani.k.dhrangadhariya@gmail.com
LinkedIn: https://www.linkedin.com/in/anjani-dhrangadhariya/

More Related Content

Similar to First Steps Towards a Risk of Bias Corpus

Knowledge transfer research examples
Knowledge transfer research examplesKnowledge transfer research examples
Knowledge transfer research examplestaem
 
Top Articles in Medical Education 2017
Top Articles in Medical Education 2017Top Articles in Medical Education 2017
Top Articles in Medical Education 2017dsandro1
 
Resident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for HaematologyResident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for HaematologyRobin Featherstone
 
Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...valéry ridde
 
Techniques in clinical epidemiology
Techniques in clinical epidemiologyTechniques in clinical epidemiology
Techniques in clinical epidemiologyBhoj Raj Singh
 
CAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptxCAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptxmariaidrees3
 
Dataset Codebook BUS7105, Week 8 Name Source Represe
Dataset Codebook  BUS7105, Week 8  Name Source RepreseDataset Codebook  BUS7105, Week 8  Name Source Represe
Dataset Codebook BUS7105, Week 8 Name Source RepreseOllieShoresna
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchAlan Fricker
 
Systematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxSystematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxDr. Anik Chakraborty
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic reviewDrNidhiPruthiShukla
 
Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...Ambika Rai
 
Development of health measurement scales - part 1
Development of health measurement scales - part 1Development of health measurement scales - part 1
Development of health measurement scales - part 1Rizwan S A
 
Correlational research
Correlational researchCorrelational research
Correlational researchAzura Zaki
 
Correlational research
Correlational researchCorrelational research
Correlational researchDhiya Lara
 
Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015KISK FF MU
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPAlAcademia Tsr
 
medicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdfmedicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdfPerioKLE
 

Similar to First Steps Towards a Risk of Bias Corpus (20)

Knowledge transfer research examples
Knowledge transfer research examplesKnowledge transfer research examples
Knowledge transfer research examples
 
Top Articles in Medical Education 2017
Top Articles in Medical Education 2017Top Articles in Medical Education 2017
Top Articles in Medical Education 2017
 
Resident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for HaematologyResident Presentations - Evidence-Based Medicine for Haematology
Resident Presentations - Evidence-Based Medicine for Haematology
 
Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...Comparison of registered and published intervention fidelity assessment in cl...
Comparison of registered and published intervention fidelity assessment in cl...
 
Techniques in clinical epidemiology
Techniques in clinical epidemiologyTechniques in clinical epidemiology
Techniques in clinical epidemiology
 
CAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptxCAT Systematic reviews of RCT.pptx
CAT Systematic reviews of RCT.pptx
 
Dataset Codebook BUS7105, Week 8 Name Source Represe
Dataset Codebook  BUS7105, Week 8  Name Source RepreseDataset Codebook  BUS7105, Week 8  Name Source Represe
Dataset Codebook BUS7105, Week 8 Name Source Represe
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative research
 
Systematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptxSystematic Review & Meta Analysis.pptx
Systematic Review & Meta Analysis.pptx
 
Spotlight Webinar: ROBINS-I
Spotlight Webinar: ROBINS-I Spotlight Webinar: ROBINS-I
Spotlight Webinar: ROBINS-I
 
How to conduct a systematic review
How to conduct a systematic reviewHow to conduct a systematic review
How to conduct a systematic review
 
Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...Efficacy of Information interventions in reducing transfer anxiety from a cri...
Efficacy of Information interventions in reducing transfer anxiety from a cri...
 
Development of health measurement scales - part 1
Development of health measurement scales - part 1Development of health measurement scales - part 1
Development of health measurement scales - part 1
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015Jan Hrabal: Evaluation of medical information quality #bcs2015
Jan Hrabal: Evaluation of medical information quality #bcs2015
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
 
medicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdfmedicine_research_slides_1415_topic6.pdf
medicine_research_slides_1415_topic6.pdf
 
judith dyson collaborative launch
judith dyson collaborative launchjudith dyson collaborative launch
judith dyson collaborative launch
 
47711.ppt
47711.ppt47711.ppt
47711.ppt
 

More from Institute of Information Systems (HES-SO)

Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Institute of Information Systems (HES-SO)
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Institute of Information Systems (HES-SO)
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Institute of Information Systems (HES-SO)
 
Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Institute of Information Systems (HES-SO)
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Institute of Information Systems (HES-SO)
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Institute of Information Systems (HES-SO)
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesInstitute of Information Systems (HES-SO)
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...Institute of Information Systems (HES-SO)
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesInstitute of Information Systems (HES-SO)
 

More from Institute of Information Systems (HES-SO) (20)

Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
 
L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?
 
Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...Studying Public Medical Images from Open Access Literature and Social Network...
Studying Public Medical Images from Open Access Literature and Social Network...
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodes
 
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair AccessibilityCrowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
 
Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
 
Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
How to detect soft falls on devices
How to detect soft falls on devicesHow to detect soft falls on devices
How to detect soft falls on devices
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
 
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLSMOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
 
Enhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET projectEnhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET project
 
Solar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptationSolar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptation
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

First Steps Towards a Risk of Bias Corpus

  • 1. First Steps Towards a Risk of Bias Corpus of Randomized Controlled Trials Presenter – Anjani Dhrangadhariya MIE2023 - Göteborg, Sweden, 23.05.23 Authors: Anjani Dhrangadhariya, Roger Hilfiker, Martin Sattelmayer, Katia Giacomino, Rahel Caliesch, Simone Elsig, Nona Naderi, Henning Müller
  • 2. Randomized Controlled Trial • In theory, an RCT accurately measures intervention effects on patient outcomes, but in practice, biases enter • Design/Planning • Execution • Analysis • Outcomes reporting • Systematic Reviews • Utility • Medical professionals • Health policies • Surgeons
  • 3. • The risk of bias specifically pertains to systematic errors in the design, conduct, or reporting of a study that can potentially lead to a deviation from the true effect being measured. • RoB assessment guidelines Risk of Bias (RoB) Example RoB assessment guidelines Year Physiotherapy Evidence Database (PEDro) 1999 Risk of Bias Assessment Tool for Nonrandomized Studies (RoBANS) 2004 Cochrane Risk of Bias assessment guidelines 2008 Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) 2016 Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) 2017 Newcastle-Ottawa Scale (NOS) 2018 Revised Cochrane Risk of Bias for RCTs 2.0 tool (RoB 2) 2019
  • 4. RoB information extraction • Thorough assessment • Manual assessment • Time-consuming • Cognitively demanding • Two experts for manual assessment • Third, for conflict resolution • Automation imperative
  • 5. Related Work • RoB labelled corpus • Wang et al. 2022 • Preclinical animal studies • Human RCTs • RobotReviewer • PDF highlights • Freely-available • Closed assess data • Cochrane RoB v1 • RoB 2.0? • RoB automation • Marshall et al. 2015 • Millard et al. 2016 • Cochrane Database (CDSR) • Closed access
  • 6. Motivation 1 No RoB text annotation guidelines exist 2 No RoB annotated RCTs exist
  • 7. Revised Cochrane RoB 2.0 tool • Can you use the guidelines to annotate text corpus? • Extensive guidelines • Step-by-step instructions • Divides RoB into 5 domains • Each domain is assessed using several signalling questions Randomization process Deviations from intended interventions Missing outcomes data Outcomes measurement Selection of reported result Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019. RoB 2: a revised tool for assessing risk of bias in randomised trials. bmj, 366.
  • 8. Revised Cochrane RoB 2.0 tool • Reviewers manually go through the RCT to identify text describing the answer to a signalling question. • Based on the answer to the signalling question, select one of the five response judgements: Yes Probably Yes Probably No No No Information
  • 9. Revised Cochrane RoB 2.0 tool • 2.1 - Were the participants aware of their assigned intervention during the trial? 2.1 No Good Risk domains Signalling questions 5 22
  • 10. Annotation schema • Follow the revised Cochrane RoB 2.0 • 110 span Labels • 1.1 Yes Good • 1.1 Probably Yes Good • 1.1 Probably No Bad • 1.1 No bad • 1.1 No Information • 1.2 Yes Good • 1.2 Probably Yes Good • 1.2 Probably No Bad • … 1.1 Yes Good Risk domain Signalling question SQ response Direction Good = low risk Bad = High risk
  • 11. Pilot Annotation • Ten RCT full-text PDFs • 2000-2019 • Four annotators • 2 scientists • 1 doctoral student • 1 scientific collaborator • Two NLP experts • 1 professor • 1 doctoral student • tagtog PDF annotation tool https://www.tagtog.com/
  • 12. Evaluation • F1-measure as Inter-annotator agreement • Disregards out-of-the-span tokens (unannotated tokens) 1. IAASQ Do the annotator pairs annotate the same text span to answer a signalling question (SQ)? 2. IAAresponse If the annotator pairs annotate the same text to answer a signalling question, do they also select same response judgment?
  • 13. Results - IAASQ • Zero or no Annotation • Domain 2 - 52% • Domain 3 - 54% • Domain 4 - 50% • Domain 5 - 61% (protocol) • Less subjective questions • Better IAA The table details the interpretation of pairwise F1-measure.
  • 14. Results - IAAresponse • IAA - SQ response judgment • Averaged over all annotator pairs • Zero agreement - 52.63% • No annotation – 22% ~75% The table details the interpretation of pairwise F1-measure.
  • 15. Error Inspection – 1. Text span disagreement • Not limiting the annotators to annotating • phrases vs full sentences 4.1 Was the method of measuring the outcome inappropriate? …The primary outcome measure was a 0–10 NRS pain score, which reflected the average pain experienced by the patient for ten days prior to follow-up… …a 0–10 NRS pain score… Phrase! Sentence
  • 16. Error Inspection – 2. Different sections • Annotators use different regions (Methods section, Results section, Table, …) of full text to come to identical labels. • Same judgment, different parts of text evidence 2.6 Was an appropriate analysis used to estimate the effect of assignment to intervention? …This study was guided by the HAPA, which has been widely used to address the gap between intention to change and a person’s actual change in behaviour [25-27]… …intention-to-treat analysis was done with missing data substituted by the last- observation-carried-forward procedure… 2.1 Yes Good
  • 17. Error Inspection – 3. Polarity disagreement … 71 allocated routine services, 67 allocated intervention service, 69 assessed at 8 weeks, 64 assessed at 8 week... 3.1 Were data for the outcome of interest available for all, or nearly all, participants randomized? • Selecting response judgment options with different polarities • Yes vs. No • Three of the four annotators responded to 3.1 with Yes, but one chose Probably no. • All or nearly all (cut-off?)
  • 18. Error Inspection – 4. Degree disagreement • Lenient - definitive • Yes • No • Stringent • Probably yes • Probably no 1.1 Was a random sequence generation method used to assign participants to intervention groups? …Patients were randomly allocated to either intervention by a computer-generated schedule stratified by sex and attendance at a day hospital…
  • 19. Conclusions 1. RoB 2.0 assessment guidelines cannot be directly used as RoB corpus annotation guidelines. 2. RoB assessment and RoB text annotation tasks are both highly subjective, but the annotation guidelines can be refined with an iterative process to improve both.
  • 20. Future Directions 1. Instructional placards as annotation guidelines 2. Larger annotated corpus of RCTs
  • 21. Dr. Roger Hilfiker Dr. Martin Sattelmayer Rahel Caliesch Katia Giacomino Dr. Nona Naderi Annotation team
  • 22. References 1. Wang, Q., Liao, J., Lapata, M., & Macleod, M. (2022). Risk of bias assessment in preclinical literature using natural language processing. Research Synthesis Methods, 13(3), 368-380. 2. Macleod, M. R., O’Collins, T., Howells, D. W., & Donnan, G. A. (2004). Pooling of animal experimental data reveals influence of study design and publication bias. Stroke, 35(5), 1203-1208. 3. Deleger L, Li Q, Lingren T, Kaiser M, Molnar K, Stoutenborough L, Kouril M, Marsolo K, Solti I. Building gold standard corpora for medical natural language processing tasks. InAMIA Annual Symposium Proceedings 2012 (Vol. 2012, p. 144). American Medical Informatics Association. 4. Sterne, J.A., Savović, J., Page, M.J., Elbers, R.G., Blencowe, N.S., Boutron, I., Cates, C.J., Cheng, H.Y., Corbett, M.S., Eldridge, S.M. and Emberson, J.R., 2019. RoB 2: a revised tool for assessing risk of bias in randomised trials. bmj, 366.
  • 23. Thank You Questions? Dataset: https://zenodo.org/record/7698941#.ZEGhXexBzzU Email: anjani.k.dhrangadhariya@gmail.com LinkedIn: https://www.linkedin.com/in/anjani-dhrangadhariya/

Editor's Notes

  1. Randomized controlled trials or RCTs, aim to accurately measure treatment effects on patient outcomes. In theory, they aim to minimize bias, but in practice, biases tend to creep into any of the trial stages. When RCTs with such questionable biases are used to write systematic reviews, they reduce the validity and utility of the review.
  2. Now, biases cannot be assessed from RCT studies, but the risk of bias can be estimated by identifying the systematic flaws in study design, planning, execution or even outcomes reporting. There are several risk-of-bias assessment guidelines that help thoroughly assess several bias risks in RCT literature. The latest published guidelines are the revised Cochrane RoB 2.0 guidelines.
  3. These guidelines help you thoroughly assess biases from RCT full-texts, but the process of manual RoB assessment is extremely time-consuming, resource intensive and cognitively demanding. Manual bias assessment is challenged by the rapidly rising publication of RCTs, and therefore, automatic RoB information extraction is imperative.
  4. There has been some work in automating RoB information extraction by Marshal and Millard studies, but the dataset used to train machine learning models is closed access. Later they developed a tool called RobotReviewer which is freely available but develops on closed access data which isn’t available to the community, and they automate using the older risk of bias guidelines. Recently, a RoB labelled corpus was released by Wang et al, but the corpus is based on preclinical animal studies and not human RCTs.
  5. So currently, we do no have any open access corpus annotated with risk of bias judgments and neither do we have guidelines to build one. These gaps prompted us to conduct this pilot project.
  6. RoB 2 are these really extensive and instructional guidelines that help you step-by-step assess the overall risk of bias from any RCT study. So before building our own annotation guidelines, we thought maybe we could use the RoB2 tool to annotate a text corpus as well. And to understand if we can use RoB 2 for this matter, we need to examine how it structures the bias assessment procedure. It divides the biases into 5 domains, each domain loosely translating to each of the trial stages. Each domain is assessed using several signalling questions.
  7. The reviewers manually go through each signalling question as it appears in the guidelines, and they try to identify text to answer this question in the RCT they are assessing. Once an answer text is found, based on that answer, they use this information to judge a minute chunk of risk corresponding to this signalling question. And based on the judgment they chose one of the five response options, with Yes mostly corresponding to yes – the answer suggests there’s risk of bias or No – there is no risk of bias for this question. However, it can also correspond to “Yes” – everything is alright and theres no risk of bias for this question.
  8. Take, for example, the signalling question 2.1. It asks whether the participants were aware of their assigned intervention during the trial. The reviewers identify the answer to this question in the text and let’s say they found that the participants were properly blinded to the intervention and were unaware of the assigned intervention meaning the bias is low and all is good for this signalling question. The reviewers needed to do it for 22 signalling questions in the RoB 2 tool so the exact procedure shown manually could be translated into the process of annotation.
  9. We need an annotation schema before starting to annotate the corpus We keep our annotation scheme very similar to how the assessment is structured in the RoB2 guidelines. Each of our span labels contains information about the domain the text is labelled for, the signalling question and also the response judgment. As the overall task of RoB assessment and annotation is very complex, we wanted to ensure the way labels are designed makes it easier for them to annotate.
  10. We then proceeded to annotate 10 full-text RCTs by four experts with varied RoB assessment expertise.
  11. This signalling question asks whether the outcomes data were available for all, or nearly all, participants randomized but does not clarify the exact cut-off for how many participant dropouts increase the risk? Therefore, the annotators make subjective response judgments depending upon what exact percentage of participant dropout is considered valid in their experience.
  12. The references, and...