SlideShare una empresa de Scribd logo
Exploiting biomedical literature to mine out a large
multimodal dataset of rare cancer studies
Anjani K. Dhrangadhariya et al.
MedGIFT group
University of Applied Sciences Western Switzerland (HES-SO)
Project supported by European Union
Horizon 2020 grant agreement 825292
SPIE Medical Imaging 2020, 16.02.2020
Motivation
> Rare cancers = 15 out of 100,000 / year
> Account for 25% cancer-related deaths
> Lower prevalence = fewer patients
> Less tumor samples for research
> Lack of robust clinical models
Puca, Loredana, et al. "Patient derived organoids to model rare prostate cancer phenotypes." Nature communications 9.1 (2018): 1-10.
2
Data resource
• Challenges
1) Private datasets
2) Limited size
3) Single center / scanner
4) Small variability
5) Some contain only images / only text
6) No or small subsets of manual annotations
7) Difficult to compare results
3
Medline/PubMed
PubMed / Medline
PubMed
Central
PubMed
Central
Open-
Access
(PMC-OA)
https://www.nlm.nih.gov/bsd/difference.html
30 million articles
~ 80 million images
5.9 million full texts
2.09 million full texts
6.73 million images
4
Rare cancer image
harvesting through
automated
knowledge
aggregation and
data mining
approaches?
2019
Individual record
Medical Subject Headings (MeSH)
Title
+
Abstract
Images
1
2
3
5
✓
✓
✓
✓
Medical Subject Headings (MeSH)
• Hierarchically organized
Controlled Vocabulary
• Cataloguing biomedical
information
• 16 thematic categories
• A = Anatomy
• B = Organism…
• Each term has a unique
MeSH Identifier
MeSH
term
MeSH
code
Lipscomb, Carolyn E. "Medical subject headings (MeSH)." Bulletin of the Medical Library Association 88.3 (2000): 265.
6
MeSH as annotation
• Manually annotated by National library of
Medicine (NLM) staff
• For e.g., All the studies about
benign cancer are indexed
under MeSH annotation “Neoplasm”
• Groundtruth annotation
• Not all PMC / PMCOA have annotations
7
Visual classification
• ImageCLEF medical image annotation
challenge (since 2013)
• Small subset of annotated PMC-OA >
train CNNs
• Classify into 31 modalities - PET, light
microscopy, CT, etc.
• State of the art: Superficial modality
classification
8
Deep Multimodal Classification of Image Types in Biomedical Journal Figures”, Andrearczyk and Müller, CLEF 2018
2000 Annotated PMC-OA
90% accuracy
Pipeline
99
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
DLMI
Diagnostic Light
Microscopy Images
10
Pipeline
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
Title +
Abstract
MeSH MeSH
vs
Visual Textual
DLMI
Diagnostic Light
Microscopy Images
Visual approach: CNNs
11
MeSH_1
MeSH_0
Model training and evaluation
• VGG19
• ImageNet weights
• With and without image
augmentation
Visual approach: CNNs
12
MeSH_1
MeSH_0
No MeSH
MeSH_1MeSH_0
Model training and evaluation
• VGG19
• ImageNet weights
• With and without image
augmentation
Title +
Abstract
Title +
Abstract
Textual approach
Title +
Abstract
Model
training &
evaluation
Best
performing
model
13
MeSH_0
MeSH_1
Title +
Abstract
MeSH_0
MeSH_1
Title +
Abstract
No MeSH
14
Pipeline
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
Title +
Abstract
MeSH MeSH
vs
- 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
Getting “human” images
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
human
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
Not human
20%
80%
Training set
Test set
human
Not human
Title +
Abstract
Title +
Abstract
=
= ⇔ B01.050.150.900.649.313.988.400.112.400.400 ∉ {MeSH}
⇔ B01.050.150.900.649.313.988.400.112.400.400 ∈ {MeSH} & other B01 codes ∉ {MeSH}
15
Getting “human” images
Title +
Abstract
Title +
Abstract
human
not human
Best performing
Model, hyper-params and
vectors
SVM, tf-idf bigrams
No MeSH
Title +
Abstract
DLMI
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
human
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
not human
20%
80%
Training set
Test set
- 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
16
17
Pipeline
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
Title +
Abstract
MeSH MeSH
vs
18
Getting “neoplastic” images
neoplastic
not neoplastic
Title +
Abstract
Title +
Abstract
=
= ⇔ C04 ∉ {MeSH}
⇔ C04 ∈ {MeSH}
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
20%
80%
Training set
Test set
human
neoplastic
not neoplastic
- 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
Getting “non-neoplastic” images
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
20%
80%
Training set
Test set
human
neoplastic
not neoplastic
Title +
Abstract
Title +
Abstract
Best performing
Model, hyper-params and
vectors
SVM, tf-idf bigrams
No MeSH
Title +
Abstract
DLMI
human
neoplastic
not neoplastic
- 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
19
20
Pipeline
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
Title +
Abstract
MeSH MeSH
vs
Getting “rare cancer” images
• No MeSH terms for “rare” cancer class
• Set of {rare cancer} terms by National Center for Advancing
Translational Sciences (NCATS)
https://rarediseases.info.nih.gov/diseases/diseases-by-category/1
21
Title +
Abstract
Title +
Abstract
DLMI
humanNo MeSH
{MeSH}
DLMI
neoplastic
human
neoplastic
Title +
Abstract
rare
cancer
Title +
Abstract
rare
cancer
= ⇔
Title +
Abstract ∩ {rare cancer} ≠
Ø Title +
Abstract
non-rare
cancer
Visual: “rare cancer”
22
rare cancer
Model training and evaluation
• VGG19
• ImageNet weights
• With and without image
augmentation
non-rare cancer
Visual: “rare cancer”
23
No label
Model training and evaluation
• VGG19
• ImageNet weights
• With and without image
augmentation
rare cancer
non-rare cancer
rare cancer non-rare cancer
Results
“human” vs. “non-human” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.69 0.71 0.68
Textual SVM Tf-idf trigrams 0.89 0.90 0.90
24
Results
“human” vs. “non-human” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.69 0.71 0.68
Textual SVM Tf-idf trigrams 0.89 0.90 0.90
“neoplastic” vs. “non-neoplastic” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.68 0.65 0.64
Textual SVM Tf-idf bigrams 0.99 0.99 0.99
25
Results
“human” vs. “non-human” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.69 0.71 0.68
Textual SVM Tf-idf trigrams 0.89 0.90 0.90
“neoplastic” vs. “non-neoplastic” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.68 0.65 0.64
Textual SVM Tf-idf bigrams 0.99 0.99 0.99
“rare cancer” vs. “non-rare cancer” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.62 0.77 0.69
26
Discussion: Textual vs. Visual
27
Textual approach
Outperformed visual approach
for all tasks
Tf-idf n-grams with SVM
performed the excellent for
both tasks.
Visual approach
Correctly classify some
“human” test instances with
recall of 0.71
Worse performance for
“neoplastic” identification
“rare cancer” classification had
a recall of 0.77
Conclusion
• First study targeting automatic rare cancer
image extraction
• Used approach relies on visual deep
learning and textual NLP
• 15,028 light microscopy (DLMI), human,
rare cancer images + corresponding journal
articles
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all data
28
1
2
3
4
5
Thank you for your attention
29
More information:
http://medgift.hevs.ch
Contact:
anjani.dhrangadhariya@hevs.ch
Follow us:
https://twitter.com/MedGIFT_group

Más contenido relacionado

La actualidad más candente

IRJET- Breast Cancer Detection from Histopathology Images: A Review
IRJET-  	  Breast Cancer Detection from Histopathology Images: A ReviewIRJET-  	  Breast Cancer Detection from Histopathology Images: A Review
IRJET- Breast Cancer Detection from Histopathology Images: A Review
IRJET Journal
 
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
Anjani Dhrangadhariya
 
Semantic representation of neuroimaging observation
Semantic representation of neuroimaging observationSemantic representation of neuroimaging observation
Semantic representation of neuroimaging observation
Emna AMDOUNI, Ph.D.
 
Deep learning application to medical imaging: Perspectives as a physician
Deep learning application to medical imaging: Perspectives as a physicianDeep learning application to medical imaging: Perspectives as a physician
Deep learning application to medical imaging: Perspectives as a physician
Hongyoon Choi
 
IRJET - Detection and Classification of Brain Tumor
IRJET - Detection and Classification of Brain TumorIRJET - Detection and Classification of Brain Tumor
IRJET - Detection and Classification of Brain Tumor
IRJET Journal
 
Detecting malaria using a deep convolutional neural network
Detecting malaria using a deep  convolutional neural networkDetecting malaria using a deep  convolutional neural network
Detecting malaria using a deep convolutional neural network
Yusuf Brima
 
Optimizing Problem of Brain Tumor Detection using Image Processing
Optimizing Problem of Brain Tumor Detection using Image ProcessingOptimizing Problem of Brain Tumor Detection using Image Processing
Optimizing Problem of Brain Tumor Detection using Image Processing
IRJET Journal
 
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Universitat Politècnica de Catalunya
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
CSCJournals
 
Skin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real TimeSkin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real Time
ijtsrd
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령
Namkug Kim
 
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
Kevin Mader
 
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...
CSCJournals
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
Alexander Decker
 
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
IRJET -  	  Lung Disease Prediction using Image Processing and CNN AlgorithmIRJET -  	  Lung Disease Prediction using Image Processing and CNN Algorithm
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
IRJET Journal
 
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORKCLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
International Research Journal of Modernization in Engineering Technology and Science
 
Review of Image Watermarking Technique for Medi
Review of Image Watermarking Technique for MediReview of Image Watermarking Technique for Medi
Review of Image Watermarking Technique for Medi
IJARIIT
 
Medical Image Processing in Nuclear Medicine and Bone Arthroplasty
Medical Image Processing in Nuclear Medicine and Bone ArthroplastyMedical Image Processing in Nuclear Medicine and Bone Arthroplasty
Medical Image Processing in Nuclear Medicine and Bone Arthroplasty
IOSR Journals
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
IJECEIAES
 
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...
ijcseit
 

La actualidad más candente (20)

IRJET- Breast Cancer Detection from Histopathology Images: A Review
IRJET-  	  Breast Cancer Detection from Histopathology Images: A ReviewIRJET-  	  Breast Cancer Detection from Histopathology Images: A Review
IRJET- Breast Cancer Detection from Histopathology Images: A Review
 
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...
 
Semantic representation of neuroimaging observation
Semantic representation of neuroimaging observationSemantic representation of neuroimaging observation
Semantic representation of neuroimaging observation
 
Deep learning application to medical imaging: Perspectives as a physician
Deep learning application to medical imaging: Perspectives as a physicianDeep learning application to medical imaging: Perspectives as a physician
Deep learning application to medical imaging: Perspectives as a physician
 
IRJET - Detection and Classification of Brain Tumor
IRJET - Detection and Classification of Brain TumorIRJET - Detection and Classification of Brain Tumor
IRJET - Detection and Classification of Brain Tumor
 
Detecting malaria using a deep convolutional neural network
Detecting malaria using a deep  convolutional neural networkDetecting malaria using a deep  convolutional neural network
Detecting malaria using a deep convolutional neural network
 
Optimizing Problem of Brain Tumor Detection using Image Processing
Optimizing Problem of Brain Tumor Detection using Image ProcessingOptimizing Problem of Brain Tumor Detection using Image Processing
Optimizing Problem of Brain Tumor Detection using Image Processing
 
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
 
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...
 
Skin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real TimeSkin Cancer Detection using Image Processing in Real Time
Skin Cancer Detection using Image Processing in Real Time
 
인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령인공지능 논문작성과 심사에관한요령
인공지능 논문작성과 심사에관한요령
 
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...
 
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
IRJET -  	  Lung Disease Prediction using Image Processing and CNN AlgorithmIRJET -  	  Lung Disease Prediction using Image Processing and CNN Algorithm
IRJET - Lung Disease Prediction using Image Processing and CNN Algorithm
 
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORKCLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
 
Review of Image Watermarking Technique for Medi
Review of Image Watermarking Technique for MediReview of Image Watermarking Technique for Medi
Review of Image Watermarking Technique for Medi
 
Medical Image Processing in Nuclear Medicine and Bone Arthroplasty
Medical Image Processing in Nuclear Medicine and Bone ArthroplastyMedical Image Processing in Nuclear Medicine and Bone Arthroplasty
Medical Image Processing in Nuclear Medicine and Bone Arthroplasty
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
 
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...
 

Similar a Exploiting biomedical literature to mine out a large multimodal dataset of rare cancers

Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Institute of Information Systems (HES-SO)
 
What are the Responsibilities of a Product Manager by Google PM
What are the Responsibilities of a Product Manager by Google PMWhat are the Responsibilities of a Product Manager by Google PM
What are the Responsibilities of a Product Manager by Google PM
Product School
 
AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
Sean Yu
 
Detection of Covid19 From Chest X-Ray and CT.pptx
Detection of Covid19 From Chest X-Ray and CT.pptxDetection of Covid19 From Chest X-Ray and CT.pptx
Detection of Covid19 From Chest X-Ray and CT.pptx
namrataSingh900842
 
Ms thesis-final-defense-presentation
Ms thesis-final-defense-presentationMs thesis-final-defense-presentation
Ms thesis-final-defense-presentation
Nashid Alam
 
M 2 presentation(final)
M 2 presentation(final)M 2 presentation(final)
M 2 presentation(final)
Nashid Alam
 
[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...
Dongmin Choi
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
FranciscoJAzuajeG
 
Masters' whole work(big back-u_pslide)
Masters' whole work(big back-u_pslide)Masters' whole work(big back-u_pslide)
Masters' whole work(big back-u_pslide)
Nashid Alam
 
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' PerspectivesIFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
Namkug Kim
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
David Peyruc
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
imgcommcall
 
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
Indian dental academy
 
Radiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer ScreeningRadiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer Screening
Wookjin Choi
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
Universitat Politècnica de Catalunya
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
Ian Foster
 
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
Sou Yoshihara
 
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
Indian dental academy
 
University of Toronto - Radiomics for Oncology - 2017
University of Toronto  - Radiomics for Oncology - 2017University of Toronto  - Radiomics for Oncology - 2017
University of Toronto - Radiomics for Oncology - 2017
Andre Dekker
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
Chandan Reddy
 

Similar a Exploiting biomedical literature to mine out a large multimodal dataset of rare cancers (20)

Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
 
What are the Responsibilities of a Product Manager by Google PM
What are the Responsibilities of a Product Manager by Google PMWhat are the Responsibilities of a Product Manager by Google PM
What are the Responsibilities of a Product Manager by Google PM
 
AI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision MedicineAI-powered Medical Imaging Analysis for Precision Medicine
AI-powered Medical Imaging Analysis for Precision Medicine
 
Detection of Covid19 From Chest X-Ray and CT.pptx
Detection of Covid19 From Chest X-Ray and CT.pptxDetection of Covid19 From Chest X-Ray and CT.pptx
Detection of Covid19 From Chest X-Ray and CT.pptx
 
Ms thesis-final-defense-presentation
Ms thesis-final-defense-presentationMs thesis-final-defense-presentation
Ms thesis-final-defense-presentation
 
M 2 presentation(final)
M 2 presentation(final)M 2 presentation(final)
M 2 presentation(final)
 
[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...[Review] High-performance medicine: the convergence of human and artificial i...
[Review] High-performance medicine: the convergence of human and artificial i...
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
Masters' whole work(big back-u_pslide)
Masters' whole work(big back-u_pslide)Masters' whole work(big back-u_pslide)
Masters' whole work(big back-u_pslide)
 
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' PerspectivesIFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
 
Medical Segmentation Decathalon
Medical Segmentation DecathalonMedical Segmentation Decathalon
Medical Segmentation Decathalon
 
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
 
Radiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer ScreeningRadiomics and Deep Learning for Lung Cancer Screening
Radiomics and Deep Learning for Lung Cancer Screening
 
Learning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep visionLearning where to look: focus and attention in deep vision
Learning where to look: focus and attention in deep vision
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...
 
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...Recent advances in diagnosis and treatment  planning1 /certified fixed orthod...
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...
 
University of Toronto - Radiomics for Oncology - 2017
University of Toronto  - Radiomics for Oncology - 2017University of Toronto  - Radiomics for Oncology - 2017
University of Toronto - Radiomics for Oncology - 2017
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 

Último

一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 

Último (20)

一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 

Exploiting biomedical literature to mine out a large multimodal dataset of rare cancers

  • 1. Exploiting biomedical literature to mine out a large multimodal dataset of rare cancer studies Anjani K. Dhrangadhariya et al. MedGIFT group University of Applied Sciences Western Switzerland (HES-SO) Project supported by European Union Horizon 2020 grant agreement 825292 SPIE Medical Imaging 2020, 16.02.2020
  • 2. Motivation > Rare cancers = 15 out of 100,000 / year > Account for 25% cancer-related deaths > Lower prevalence = fewer patients > Less tumor samples for research > Lack of robust clinical models Puca, Loredana, et al. "Patient derived organoids to model rare prostate cancer phenotypes." Nature communications 9.1 (2018): 1-10. 2
  • 3. Data resource • Challenges 1) Private datasets 2) Limited size 3) Single center / scanner 4) Small variability 5) Some contain only images / only text 6) No or small subsets of manual annotations 7) Difficult to compare results 3
  • 4. Medline/PubMed PubMed / Medline PubMed Central PubMed Central Open- Access (PMC-OA) https://www.nlm.nih.gov/bsd/difference.html 30 million articles ~ 80 million images 5.9 million full texts 2.09 million full texts 6.73 million images 4 Rare cancer image harvesting through automated knowledge aggregation and data mining approaches? 2019
  • 5. Individual record Medical Subject Headings (MeSH) Title + Abstract Images 1 2 3 5 ✓ ✓ ✓ ✓
  • 6. Medical Subject Headings (MeSH) • Hierarchically organized Controlled Vocabulary • Cataloguing biomedical information • 16 thematic categories • A = Anatomy • B = Organism… • Each term has a unique MeSH Identifier MeSH term MeSH code Lipscomb, Carolyn E. "Medical subject headings (MeSH)." Bulletin of the Medical Library Association 88.3 (2000): 265. 6
  • 7. MeSH as annotation • Manually annotated by National library of Medicine (NLM) staff • For e.g., All the studies about benign cancer are indexed under MeSH annotation “Neoplasm” • Groundtruth annotation • Not all PMC / PMCOA have annotations 7
  • 8. Visual classification • ImageCLEF medical image annotation challenge (since 2013) • Small subset of annotated PMC-OA > train CNNs • Classify into 31 modalities - PET, light microscopy, CT, etc. • State of the art: Superficial modality classification 8 Deep Multimodal Classification of Image Types in Biomedical Journal Figures”, Andrearczyk and Müller, CLEF 2018 2000 Annotated PMC-OA 90% accuracy
  • 9. Pipeline 99 Getting DLMI images Getting “human” images Getting “neoplastic” images Getting “rare cancer” images PMC-OA all images 1 2 3 4 5 DLMI Diagnostic Light Microscopy Images
  • 10. 10 Pipeline Getting DLMI images Getting “human” images Getting “neoplastic” images Getting “rare cancer” images PMC-OA all images 1 2 3 4 5 Title + Abstract MeSH MeSH vs Visual Textual DLMI Diagnostic Light Microscopy Images
  • 11. Visual approach: CNNs 11 MeSH_1 MeSH_0 Model training and evaluation • VGG19 • ImageNet weights • With and without image augmentation
  • 12. Visual approach: CNNs 12 MeSH_1 MeSH_0 No MeSH MeSH_1MeSH_0 Model training and evaluation • VGG19 • ImageNet weights • With and without image augmentation
  • 13. Title + Abstract Title + Abstract Textual approach Title + Abstract Model training & evaluation Best performing model 13 MeSH_0 MeSH_1 Title + Abstract MeSH_0 MeSH_1 Title + Abstract No MeSH
  • 14. 14 Pipeline Getting DLMI images Getting “human” images Getting “neoplastic” images Getting “rare cancer” images PMC-OA all images 1 2 3 4 5 Title + Abstract MeSH MeSH vs
  • 15. - 0.5467 0.1111 0.5789 - 0.3789 - 0.4999 0.6687 - 0.1167 0.9976 Getting “human” images Title + Abstract Title + Abstract Title + Abstract {MeSH} DLMI human Model training and evaluation 1. Logistic regression 2. Support Vector Machine 3. K-nearest neighbor 1. Tf-idf, 2. Word vectors, 3. paragraph vector Not human 20% 80% Training set Test set human Not human Title + Abstract Title + Abstract = = ⇔ B01.050.150.900.649.313.988.400.112.400.400 ∉ {MeSH} ⇔ B01.050.150.900.649.313.988.400.112.400.400 ∈ {MeSH} & other B01 codes ∉ {MeSH} 15
  • 16. Getting “human” images Title + Abstract Title + Abstract human not human Best performing Model, hyper-params and vectors SVM, tf-idf bigrams No MeSH Title + Abstract DLMI Title + Abstract Title + Abstract Title + Abstract {MeSH} DLMI human Model training and evaluation 1. Logistic regression 2. Support Vector Machine 3. K-nearest neighbor 1. Tf-idf, 2. Word vectors, 3. paragraph vector not human 20% 80% Training set Test set - 0.5467 0.1111 0.5789 - 0.3789 - 0.4999 0.6687 - 0.1167 0.9976 16
  • 17. 17 Pipeline Getting DLMI images Getting “human” images Getting “neoplastic” images Getting “rare cancer” images PMC-OA all images 1 2 3 4 5 Title + Abstract MeSH MeSH vs
  • 18. 18 Getting “neoplastic” images neoplastic not neoplastic Title + Abstract Title + Abstract = = ⇔ C04 ∉ {MeSH} ⇔ C04 ∈ {MeSH} Title + Abstract Title + Abstract Title + Abstract {MeSH} DLMI Model training and evaluation 1. Logistic regression 2. Support Vector Machine 3. K-nearest neighbor 1. Tf-idf, 2. Word vectors, 3. paragraph vector 20% 80% Training set Test set human neoplastic not neoplastic - 0.5467 0.1111 0.5789 - 0.3789 - 0.4999 0.6687 - 0.1167 0.9976
  • 19. Getting “non-neoplastic” images Title + Abstract Title + Abstract Title + Abstract {MeSH} DLMI Model training and evaluation 1. Logistic regression 2. Support Vector Machine 3. K-nearest neighbor 1. Tf-idf, 2. Word vectors, 3. paragraph vector 20% 80% Training set Test set human neoplastic not neoplastic Title + Abstract Title + Abstract Best performing Model, hyper-params and vectors SVM, tf-idf bigrams No MeSH Title + Abstract DLMI human neoplastic not neoplastic - 0.5467 0.1111 0.5789 - 0.3789 - 0.4999 0.6687 - 0.1167 0.9976 19
  • 20. 20 Pipeline Getting DLMI images Getting “human” images Getting “neoplastic” images Getting “rare cancer” images PMC-OA all images 1 2 3 4 5 Title + Abstract MeSH MeSH vs
  • 21. Getting “rare cancer” images • No MeSH terms for “rare” cancer class • Set of {rare cancer} terms by National Center for Advancing Translational Sciences (NCATS) https://rarediseases.info.nih.gov/diseases/diseases-by-category/1 21 Title + Abstract Title + Abstract DLMI humanNo MeSH {MeSH} DLMI neoplastic human neoplastic Title + Abstract rare cancer Title + Abstract rare cancer = ⇔ Title + Abstract ∩ {rare cancer} ≠ Ø Title + Abstract non-rare cancer
  • 22. Visual: “rare cancer” 22 rare cancer Model training and evaluation • VGG19 • ImageNet weights • With and without image augmentation non-rare cancer
  • 23. Visual: “rare cancer” 23 No label Model training and evaluation • VGG19 • ImageNet weights • With and without image augmentation rare cancer non-rare cancer rare cancer non-rare cancer
  • 24. Results “human” vs. “non-human” classification Data type Classifier Feature Precision Recall F1-score Visual VGG19 With data augmentation 0.69 0.71 0.68 Textual SVM Tf-idf trigrams 0.89 0.90 0.90 24
  • 25. Results “human” vs. “non-human” classification Data type Classifier Feature Precision Recall F1-score Visual VGG19 With data augmentation 0.69 0.71 0.68 Textual SVM Tf-idf trigrams 0.89 0.90 0.90 “neoplastic” vs. “non-neoplastic” classification Data type Classifier Feature Precision Recall F1-score Visual VGG19 With data augmentation 0.68 0.65 0.64 Textual SVM Tf-idf bigrams 0.99 0.99 0.99 25
  • 26. Results “human” vs. “non-human” classification Data type Classifier Feature Precision Recall F1-score Visual VGG19 With data augmentation 0.69 0.71 0.68 Textual SVM Tf-idf trigrams 0.89 0.90 0.90 “neoplastic” vs. “non-neoplastic” classification Data type Classifier Feature Precision Recall F1-score Visual VGG19 With data augmentation 0.68 0.65 0.64 Textual SVM Tf-idf bigrams 0.99 0.99 0.99 “rare cancer” vs. “non-rare cancer” classification Data type Classifier Feature Precision Recall F1-score Visual VGG19 With data augmentation 0.62 0.77 0.69 26
  • 27. Discussion: Textual vs. Visual 27 Textual approach Outperformed visual approach for all tasks Tf-idf n-grams with SVM performed the excellent for both tasks. Visual approach Correctly classify some “human” test instances with recall of 0.71 Worse performance for “neoplastic” identification “rare cancer” classification had a recall of 0.77
  • 28. Conclusion • First study targeting automatic rare cancer image extraction • Used approach relies on visual deep learning and textual NLP • 15,028 light microscopy (DLMI), human, rare cancer images + corresponding journal articles Getting DLMI images Getting “human” images Getting “neoplastic” images Getting “rare cancer” images PMC-OA all data 28 1 2 3 4 5
  • 29. Thank you for your attention 29 More information: http://medgift.hevs.ch Contact: anjani.dhrangadhariya@hevs.ch Follow us: https://twitter.com/MedGIFT_group

Notas del editor

  1. 2
  2. 3
  3. 4
  4. How are these biomedical publications stored in Medline represented in PubMed? A PubMed record consists of Title and Abstract followed by Publication images as shown in thumbnails. And a list of Medical Subject Headings or MeSH annotations that are like keywords or annotations describing something about the publication. All these text, images and MeSH terms are stringed together by the unique PubMed Identifier or PMID. You can also notice a PMCID or unique pubmed central identifier that links to the full-text of the publication. All these components, the images, text and the MeSH terms have thus 1 to 1 association with each other.
  5. 6
  6. PubMed records are manually annotated with MeSH terms by staff at NLM. What is the significance of attaching MeSH terms to a PubMed record? MeSH annotation enforces uniformity and consistency across the terminology in a way that all articles about benign cancer are indexed under MeSH term “Neoplasm”, all the articles or studies involving patients are annotated under MeSH term “Humans” So MeSH terms could be considered as gold standard annotations or groundtruth annotations for a publication. Not all publications in PubMed have these manually attached MeSH terms.
  7. Have this PMC-OA images been used elsewhere for image analysis? Yes, an annotated subset of PMC-OA has already been used in ImageCLEF medical image annotation challenge which is a public challenge that has been taking place since 2013. This small annotated subset of 2000 images was used to train CNNs for image classification into 31 image modality classes… Including PET, CT images, light microscopy images, et cetera. This classification approach achieved an overall 90% accuracy for modality classification. However, this approach only goes till superficial modality classification task. What about going beyond this generic modality classification into more specialized image sets?
  8. So what we did for navigating towards rare cancer sets was this: Take all the PMC-OA images and classify them using ImageCLEF setup into 31 modality types. Retain all the images classified as DLMI or diagnostic light microscopy images. We focus only upon DLMI images because they are fundamental to rare cancer diagnostics. All the retained DLMI images are linked to their respective title, abstract and MeSH annotations if available. With this multimodal annotated dataset in hand, we propose an approach for sequential curation of article abstracts and images using MeSH terms to eventually mine-out a large multimodal set of rare cancer images and full-texts.
  9. This involves three subsequent binary classification tasks where we first filter “human” from “non-human” set, followed by separating “neoplastic” from “non-neoplastic” set and finally separating “rare cancer“ from the “non-rare cancer“. It has to be noticed that at each binary classification step we compare visual vs. textual approach separately and use MeSH terms as the groundtruth labels for the datasets.
  10. For the visual classification tasks, images with two different MeSH classes were used to and evaluate VGG19 model using pretrained trained ImageNet weights and fine-tuned with and without image augmentation Data augmentation: image mirroring and cropping. Why do we use VGG?
  11. This fined-tuned models were then used to classify unlabeled images into their respective classes.
  12. 13
  13. Lets get back to the pipeline for further curating the previously retrieved DLMI dataset. «human» records were first filtered out from «non-human records» in following way.
  14. 15
  15. Best performing model setup was used to classify the un-annotated DLMI records into “human” and “non-human”.
  16. Then «neoplastic» or tumor-related records were separated from «non-neoplastic» records in similar manner.
  17. 18
  18. Best performing model setup was used to classify the un-annotated records into “neoplasm” and “non-neoplasm”. This was about the annotated text dataset. Similarly, the annotated image dataset classified using VGG19 setup.
  19. Finally, we chaff out rare cancer dataset from the non-rare cancer dataset.
  20. Unfortunately, there are no MeSH terms pertaining to “rare cancer”, so we used a pre-defined set of rare cancer terms available from NCATS. All the records recognized as “neoplasm” were retained and filtered out as “rare cancer” only if rare cancer term from NCATS set was present in the title and the abstract.
  21. After getting «rare cancer» and the «non-rare cancer» labels for images from the previous text classification, we used them to train and evaluate a VGG19 model for this binary classification task.
  22. After getting «rare cancer» and the «non-rare cancer» labels for images from the previous text classification, we used them to train and evaluate a VGG19 model for this binary classification task.
  23. For the «human» classification task, textual approach performed far better than visual approach. However, a recall of 0.71 hints that the visual classification model does learn something about retaining human images.
  24. For the neoplasm classification task too, textual performed better than visual. Visual approach did not have good results for this task.
  25. For the final task, a recall of 0.77 does hint that VGG19 model did learn something by better retaining the «rare cancer» images, but it has much room for improvement.
  26. Classification: Individual images ≠ full-texts