This document summarizes a study that mined biomedical literature to create a large multimodal dataset of rare cancer studies. The researchers harvested over 15,000 images and corresponding journal articles related to rare cancers from public literature databases. They used both visual and textual classification approaches to identify images of humans, neoplastic tissues, and rare cancers. The textual approach using TF-IDF and SVMs outperformed visual CNN classifiers for all tasks. This created the first dataset aimed at automatically extracting rare cancer images to help address challenges in researching these less prevalent cancers.
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...Anjani Dhrangadhariya
Is multitask learning worthy in PICO recognition? We explored this question in out paper with the same name (Read our paper here https://arodes.hes-so.ch/record/8949?ln=FR). These slides correspond to the paper and were presented in CLEF2021 Romania, Bucharest.
Classification of prostate cancer pathology reports using natural language pr...Anjani Dhrangadhariya
This document summarizes research on classifying prostate cancer pathology reports into high-grade and low-grade categories using natural language processing. The best performing model was a logistic regression model trained on paragraph vector representations of reports, achieving an ROC AUC score of 0.91. An analysis of the model's interpretations found that it strongly associated terms like "Gleason 4+5=9" with high-grade cancer and "Gleason grade 3+3" with low-grade cancer. Future work will aim to extract additional clinical information like tumor staging from the reports.
Definiens technology provides automated digital pathology image analysis to transform pathology into a quantitative science. It handles challenges like tissue variability and staining intensities to automatically identify regions of interest and quantify objects with accuracy and consistency. Definiens supports various digital image and slide formats as well as staining protocols. It has been deployed in over 1,400 applications and provides detailed quantification to support research and clinical decision making. A study using Definiens image analysis achieved statistically significant survival prediction for esophageal cancer patients compared to manual evaluation.
CLASSIFICATION OF OCT IMAGES FOR DETECTING DIABETIC RETINOPATHY DISEASE USING...sipij
Optical Coherence Tomography (OCT) imaging aids in retinal abnormality detection by showing the
tomographic retinal layers. OCT images are a useful tool for detecting Diabetic Retinopathy (DR) disease
because of their capability to capture micrometer-resolution. An automated technique was introduced to
differentiate DR images from normal ones. 214 images were subjected to the experiment, of which 160
images were used for classifiers’ training, and 54 images were used for testing. Different features were
extracted to feed our classifiers, including statistical features and local binary pattern (LBP) features. The
experimental results demonstrated that our classifiers were able to discriminate DR retina from the normal
retina with Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of 100%. The retinal
OCT images have common texture patterns and using a powerful tool for pattern analysis like LBP
features has a significant impact on the achieved results. The result has better performance than previously
proposed methods in the literature.
IRJET- Diversified Segmentation and Classification Techniques on Brain Tu...IRJET Journal
This document summarizes 20 research papers on techniques for detecting brain tumors using medical images like MRI scans. It discusses several techniques for image segmentation, feature extraction, and classification that have been used to automatically detect and diagnose brain tumors. The goal of the work is to consolidate these different techniques and provide new insights on recent approaches to brain tumor image processing. Key methods discussed include convolutional neural networks, random forest classifiers, discrete wavelet transforms, and probabilistic neural networks.
A COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR EEG SIGNAL CLASSIFICATIONsipij
In this paper, different machine learning algorithms such as Linear Discriminant Analysis, Support vector
machine (SVM), Multi-layer perceptron, Random forest, K-nearest neighbour, and Autoencoder with SVM
have been compared. This comparison was conducted to seek a robust method that would produce good
classification accuracy. To this end, a robust method of classifying raw Electroencephalography (EEG)
signals associated with imagined movement of the right hand and relaxation state, namely Autoencoder
with SVM has been proposed. The EEG dataset used in this research was created by the University of
Tubingen, Germany. The best classification accuracy achieved was 70.4% with SVM through feature
engineering. However, our prosed method of autoencoder in combination with SVM produced a similar
accuracy of 65% without using any feature engineering technique. This research shows that this system of
classification of motor movements can be used in a Brain-Computer Interface system (BCI) to mentally
control a robotic device or an exoskeleton.
During past few years, brain tumor segmentation in CT has become an emergent research area in the field of medical imaging system. Brain tumor detection helps in finding the exact size and location of tumor. An efficient algorithm is proposed in this project for tumor detection based on segmentation and morphological operators. Firstly quality of scanned image is enhanced and then morphological operators are applied to detect the tumor in the scanned image. The problem with biopsy is that the patient has to be hospitalized and also the results (around 15%) give false negative. Scan images are read by radiologist but it's a subjective analysis which requires more experience. In the proposed work we segment the renal region and then classify the tumors as benign or malignant by using ANFIS, which is a non-invasive automated process. This approach reduces the waiting time of the patient.
End-to-end Fine-grained Neural Entity Recognition of Patients, Interventions,...Anjani Dhrangadhariya
Is multitask learning worthy in PICO recognition? We explored this question in out paper with the same name (Read our paper here https://arodes.hes-so.ch/record/8949?ln=FR). These slides correspond to the paper and were presented in CLEF2021 Romania, Bucharest.
Classification of prostate cancer pathology reports using natural language pr...Anjani Dhrangadhariya
This document summarizes research on classifying prostate cancer pathology reports into high-grade and low-grade categories using natural language processing. The best performing model was a logistic regression model trained on paragraph vector representations of reports, achieving an ROC AUC score of 0.91. An analysis of the model's interpretations found that it strongly associated terms like "Gleason 4+5=9" with high-grade cancer and "Gleason grade 3+3" with low-grade cancer. Future work will aim to extract additional clinical information like tumor staging from the reports.
Definiens technology provides automated digital pathology image analysis to transform pathology into a quantitative science. It handles challenges like tissue variability and staining intensities to automatically identify regions of interest and quantify objects with accuracy and consistency. Definiens supports various digital image and slide formats as well as staining protocols. It has been deployed in over 1,400 applications and provides detailed quantification to support research and clinical decision making. A study using Definiens image analysis achieved statistically significant survival prediction for esophageal cancer patients compared to manual evaluation.
CLASSIFICATION OF OCT IMAGES FOR DETECTING DIABETIC RETINOPATHY DISEASE USING...sipij
Optical Coherence Tomography (OCT) imaging aids in retinal abnormality detection by showing the
tomographic retinal layers. OCT images are a useful tool for detecting Diabetic Retinopathy (DR) disease
because of their capability to capture micrometer-resolution. An automated technique was introduced to
differentiate DR images from normal ones. 214 images were subjected to the experiment, of which 160
images were used for classifiers’ training, and 54 images were used for testing. Different features were
extracted to feed our classifiers, including statistical features and local binary pattern (LBP) features. The
experimental results demonstrated that our classifiers were able to discriminate DR retina from the normal
retina with Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of 100%. The retinal
OCT images have common texture patterns and using a powerful tool for pattern analysis like LBP
features has a significant impact on the achieved results. The result has better performance than previously
proposed methods in the literature.
IRJET- Diversified Segmentation and Classification Techniques on Brain Tu...IRJET Journal
This document summarizes 20 research papers on techniques for detecting brain tumors using medical images like MRI scans. It discusses several techniques for image segmentation, feature extraction, and classification that have been used to automatically detect and diagnose brain tumors. The goal of the work is to consolidate these different techniques and provide new insights on recent approaches to brain tumor image processing. Key methods discussed include convolutional neural networks, random forest classifiers, discrete wavelet transforms, and probabilistic neural networks.
A COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR EEG SIGNAL CLASSIFICATIONsipij
In this paper, different machine learning algorithms such as Linear Discriminant Analysis, Support vector
machine (SVM), Multi-layer perceptron, Random forest, K-nearest neighbour, and Autoencoder with SVM
have been compared. This comparison was conducted to seek a robust method that would produce good
classification accuracy. To this end, a robust method of classifying raw Electroencephalography (EEG)
signals associated with imagined movement of the right hand and relaxation state, namely Autoencoder
with SVM has been proposed. The EEG dataset used in this research was created by the University of
Tubingen, Germany. The best classification accuracy achieved was 70.4% with SVM through feature
engineering. However, our prosed method of autoencoder in combination with SVM produced a similar
accuracy of 65% without using any feature engineering technique. This research shows that this system of
classification of motor movements can be used in a Brain-Computer Interface system (BCI) to mentally
control a robotic device or an exoskeleton.
During past few years, brain tumor segmentation in CT has become an emergent research area in the field of medical imaging system. Brain tumor detection helps in finding the exact size and location of tumor. An efficient algorithm is proposed in this project for tumor detection based on segmentation and morphological operators. Firstly quality of scanned image is enhanced and then morphological operators are applied to detect the tumor in the scanned image. The problem with biopsy is that the patient has to be hospitalized and also the results (around 15%) give false negative. Scan images are read by radiologist but it's a subjective analysis which requires more experience. In the proposed work we segment the renal region and then classify the tumors as benign or malignant by using ANFIS, which is a non-invasive automated process. This approach reduces the waiting time of the patient.
IRJET- Breast Cancer Detection from Histopathology Images: A ReviewIRJET Journal
This document provides a review of techniques for detecting breast cancer from histopathology images. It discusses how histopathology examines tissue samples under a microscope to study diseases at a microscopic level. Detecting cell nuclei is an important first step, as is identifying mitosis (cell division) and metastasis (cancer spreading). The document reviews several techniques that use convolutional neural networks to automatically analyze histopathology images and detect breast cancer, including techniques for nuclei detection and segmentation. These automatic methods aim to assist pathologists by improving efficiency and reducing human error compared to manual analysis.
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...Anjani Dhrangadhariya
PICO recognition is an information extraction task for identifying participant, intervention, comparator, and outcome information from clinical literature.
Manually identifying PICO information is the most time-consuming step for conducting systematic reviews (SR) which is already a labor-intensive process.
A lack of diversified and large, annotated corpora restricts innovation and adoption of automated PICO recognition systems.
The largest-available PICO entity/span corpus is manually annotated which is too expensive for a majority of the scientific community.
To break through the bottleneck, we propose DISTANT-CTO, a novel distantly supervised PICO entity extraction approach using the clinical trials literature, to generate a massive weakly-labeled dataset with more than a million ``Intervention'' and ``Comparator'' entity annotations.
We train distant NER (named-entity recognition) models using this weakly-labeled dataset and demonstrate that it outperforms even the sophisticated models trained on the manually annotated dataset with a 2\% F1 improvement over the Intervention entity of the PICO benchmark and more than 5\% improvement when combined with the manually annotated dataset.
We investigate the generalizability of our approach and gain an impressive F1 score on another domain-specific PICO benchmark.
The approach is not only zero-cost but is also scalable for a constant stream of PICO entity annotations.
The main objective of this work is to facilitate the identification, sharing, and reasoning about cerebral tumors observations via the formalization of their semantic meanings in order to facilitate their exploitation in both the clinical practice and research. We focused our analysis on the VASARI terminology as a proof of concept, but we are convinced that our work can be useful in other biomedical imaging contexts.
Deep learning application to medical imaging: Perspectives as a physicianHongyoon Choi
Deep learning can be applied to medical imaging to directly extract biomarkers from images or enhance existing biomarkers. It can provide prognostic information beyond diagnosis, such as predicting survival outcomes. Challenges include obtaining sufficient labeled data, handling imbalanced or unlabeled data, and estimating certainty in deep learning decisions. Future work aims to address these issues and define normal populations to identify abnormal data.
IRJET - Detection and Classification of Brain TumorIRJET Journal
This document presents a novel method for classifying brain MRI images as normal or abnormal using tumor detection. The method first uses wavelet transforms to extract features from images. It then applies principal component analysis to reduce the feature dimensions. The reduced features are input to a kernel support vector machine for classification. A k-fold cross validation strategy is used to enhance the generalization of the support vector machine model. The proposed system takes MRI brain images as input, detects any tumors by highlighting the affected area, and specifies tumor characteristics like dimensions and type (benign or malignant).
Detecting malaria using a deep convolutional neural networkYusuf Brima
Experiment with Deep Residual Convolutional Neural Network to classify microscopic blood cell images (Uninfected, Parasitized)
Utiling ResNet,Deep Residual Learning for Image Recognition (He et al, 2015) architecture.
Uses Keras with a Tensorflow backend.
Optimizing Problem of Brain Tumor Detection using Image ProcessingIRJET Journal
This document summarizes several existing methods for detecting brain tumors using magnetic resonance imaging (MRI). It discusses techniques such as image preprocessing, segmentation, feature extraction, and classification methods. Specifically, it reviews 10 different papers that propose various approaches for brain tumor detection, segmentation, and classification. These include using k-means clustering, fuzzy c-means, probabilistic neural networks, support vector machines, genetic algorithms, and sparse representation classification. The goal is to evaluate and compare different existing methods for automated brain tumor detection and analysis using MRI images.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...CSCJournals
This document summarizes a research paper that proposes a new crossover operator called Sequential Constructive Crossover (SCX) for solving the Traveling Salesman Problem (TSP) using a genetic algorithm. SCX constructs offspring from parent chromosomes by selecting better edges present in the parents while maintaining the node sequence. The performance of SCX is compared to other crossover operators like Edge Recombination Crossover and Generalized N-point Crossover on benchmark TSP instances, and experimental results show that SCX finds higher quality solutions than the other operators. The TSP is an NP-complete problem where the goal is to find the shortest route to visit all cities on a tour and return to the starting city. Genetic algorithms are
Skin Cancer Detection using Image Processing in Real Timeijtsrd
Machine learning is a fascinating topic its astonishing how a small change in the evaluation values may result in an unfathomable number of outcomes. The goal of this study is to develop a model that uses image processing to identify skin cancer. We will later use the model in real life through an android application. Sunami Dasgupta | Soham Das | Sayani Hazra Pal "Skin Cancer Detection using Image-Processing in Real-Time" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-6 , October 2021, URL: https://www.ijtsrd.com/papers/ijtsrd46384.pdf Paper URL : https://www.ijtsrd.com/computer-science/artificial-intelligence/46384/skin-cancer-detection-using-imageprocessing-in-realtime/sunami-dasgupta
지난주말에 있었던 제 4회 대한신경집중치료학회 편집위원회 워크샵에서 발표했던 내용중에 발췌한 것입니다. 원래 제목은 "인공지능 관련 연구: 논문 작성과 심사에 관한 요령" 입니다. 최근에 deep learning in medical imaging으로 2편의 리뷰와 논문 1편, CADD 논문, 앙상블 논문 1편이 되면서 요청이 온것 같습니다.부족한 제가 하기 어려운 주제를 맡았는데, 혹시 도움이 되실 분이 있으면 도움을 되시라고 올려드립니다. 결론은 인공지능 연구라고 특별히 다르지는 않지만, 공학 연구와 의학연구가 다르고, 인공지능 특성을 잘 이해해야 한다 정도 될것 같습니다. (상당부분 저희병원 박성호 교수님의 radiology 논문 Methodology for Evaluation of Clinical Performance and Impact of Artificial Intelligence Technology for Medical Diagnosis and Prediction을 참고했습니다.)
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...Kevin Mader
Review the basic principles of machine learning.
Learn what texture analysis is and how to apply it to medical imaging.
Understand how to combine texture analysis and machine learning for lesion classification tasks.
Learn the how to visualize and analyze results.
Understand how to avoid common mistakes like overfitting and incorrect model selection.
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...CSCJournals
This document summarizes a research paper that proposes a new method for automatically segmenting brain tumors in CT images. The method uses a combination of wavelet-based texture features extracted from discrete wavelet transformed sub-bands. These features are optimized using genetic algorithms and used to train probabilistic neural network and feedforward neural network classifiers to segment tumors. The proposed method is evaluated on brain CT images and shown to outperform existing segmentation methods.
Comparing prediction accuracy for machine learning andAlexander Decker
This document compares the predictive accuracy of various machine learning and statistical classification algorithms on four gene expression datasets. It finds that KNN, RDA, and SVM with a linear kernel generally have lower misclassification rates than DLDA when classifying tumors using different numbers of selected genes (50, 200, 500). Classification performance is evaluated using hold-out cross-validation, and the algorithms are tested on leukemia, lymphoma, SRBCT, and prostate cancer gene expression datasets containing between 38 and 102 samples each.
IRJET - Lung Disease Prediction using Image Processing and CNN AlgorithmIRJET Journal
This document summarizes a research paper that proposes a method for predicting lung disease using image processing and convolutional neural networks (CNNs). The method involves preprocessing chest x-ray images through steps like lung field segmentation, feature extraction, and then classifying the images as normal or abnormal using neural networks and support vector machines (SVMs). The researchers tested their approach on two datasets and were able to classify digital chest x-ray images into normal and abnormal categories with high accuracy. The goal of the research is to develop an automated system for early detection of lung cancer using chest x-rays, as early detection is key to better treatment outcomes.
The classification of different types of tumors is of great importance in cancer diagnosis and its drug discovery. Cancer classification via gene expression data is known to contain the keys for solving the fundamental problems relating to the diagnosis of cancer. The recent advent of DNA microarray technology has made rapid monitoring of thousands of gene expressions possible. With this large quantity of gene expression data, scientists have started to explore the opportunities of classification of cancer using a gene expression dataset. To gain a profound understanding of the classification of cancer, it is necessary to take a closer look at the problem, the proposed solutions, and the related issues altogether. In this research thesis, I present a new way for Leukemia classification using the latest AI technique of Deep learning using Google TensorFlow on gene expression data.
Review of Image Watermarking Technique for MediIJARIIT
In this article, we focus on the complementary role of watermarking with respect to medical information security (integrity, authenticity …) and management. We review sample cases where watermarking has been deployed. We conclude that watermarking has found a niche role in healthcare systems, as an instrument for protection of medical information, for secure sharing and handling of medical images. The concern of medical experts on the preservation of documents diagnostic integrity remains paramount. Medical image watermarking is an appropriate method used for enhancing security and authentication of medical data, which is crucial and used for further diagnosis and reference. This paper discusses the available medical image watermarking methods for protecting and authenticating medical data. The paper focuses on algorithms for application of watermarking technique on Region of Non Interest (RONI) of the medical image preserving Region of Interest (ROI).
Medical Image Processing in Nuclear Medicine and Bone ArthroplastyIOSR Journals
This document discusses medical image processing in nuclear medicine and bone arthroplasty. It provides background on nuclear medicine imaging techniques like planar imaging, SPECT, PET and hybrid SPECT/CT and PET/CT systems. It then discusses how MATLAB can be used for medical image processing tasks in nuclear medicine like organ contouring, interpolation, filtering, segmentation, background removal, registration and volume quantification. Specific examples of nuclear medicine examinations that can be analyzed using MATLAB algorithms are also mentioned.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...ijcseit
The research work presented in this paper is to achieve the tissue classification and automatically
diagnosis the abnormal tumor region present in Computed Tomography (CT) images using the wavelet
based statistical texture analysis method. Comparative studies of texture analysis method are performed
for the proposed wavelet based texture analysis method and Spatial Gray Level Dependence Method
(SGLDM). Our proposed system consists of four phases i) Discrete Wavelet Decomposition (ii)
Feature extraction (iii) Feature selection (iv) Analysis of extracted texture features by classifier. A
wavelet based statistical texture feature set is derived from normal and tumor regions. Genetic Algorithm
(GA) is used to select the optimal texture features from the set of extracted texture features. We construct
the Support Vector Machine (SVM) based classifier and evaluate the performance of classifier by
comparing the classification results of the SVM based classifier with the Back Propagation Neural network
classifier(BPN). The results of Support Vector Machine (SVM), BPN classifiers for the texture analysis
methods are evaluated using Receiver Operating Characteristic (ROC) analysis. Experimental results
show that the classification accuracy of SVM is 96% for 10 fold cross validation method. The system
has been tested with a number of real Computed Tomography brain images and has achieved satisfactory
results.
Exploiting biomedical literature to mine out a large multimodal dataset of rare cancer studies. Presentation of Anjani K. Dhrangadhariya (Institute of Information Systems, HES-SO Valais-Wallis, Sierre) at SPIE Medical Imaging 2020.
What are the Responsibilities of a Product Manager by Google PMProduct School
Main takeaways:
-Why Product Managers are critical for research organizations
-Find out what a Product Manager at DeepMind does
-Product Management at the complex intersection of AI and healthcare
IRJET- Breast Cancer Detection from Histopathology Images: A ReviewIRJET Journal
This document provides a review of techniques for detecting breast cancer from histopathology images. It discusses how histopathology examines tissue samples under a microscope to study diseases at a microscopic level. Detecting cell nuclei is an important first step, as is identifying mitosis (cell division) and metastasis (cancer spreading). The document reviews several techniques that use convolutional neural networks to automatically analyze histopathology images and detect breast cancer, including techniques for nuclei detection and segmentation. These automatic methods aim to assist pathologists by improving efficiency and reducing human error compared to manual analysis.
DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resour...Anjani Dhrangadhariya
PICO recognition is an information extraction task for identifying participant, intervention, comparator, and outcome information from clinical literature.
Manually identifying PICO information is the most time-consuming step for conducting systematic reviews (SR) which is already a labor-intensive process.
A lack of diversified and large, annotated corpora restricts innovation and adoption of automated PICO recognition systems.
The largest-available PICO entity/span corpus is manually annotated which is too expensive for a majority of the scientific community.
To break through the bottleneck, we propose DISTANT-CTO, a novel distantly supervised PICO entity extraction approach using the clinical trials literature, to generate a massive weakly-labeled dataset with more than a million ``Intervention'' and ``Comparator'' entity annotations.
We train distant NER (named-entity recognition) models using this weakly-labeled dataset and demonstrate that it outperforms even the sophisticated models trained on the manually annotated dataset with a 2\% F1 improvement over the Intervention entity of the PICO benchmark and more than 5\% improvement when combined with the manually annotated dataset.
We investigate the generalizability of our approach and gain an impressive F1 score on another domain-specific PICO benchmark.
The approach is not only zero-cost but is also scalable for a constant stream of PICO entity annotations.
The main objective of this work is to facilitate the identification, sharing, and reasoning about cerebral tumors observations via the formalization of their semantic meanings in order to facilitate their exploitation in both the clinical practice and research. We focused our analysis on the VASARI terminology as a proof of concept, but we are convinced that our work can be useful in other biomedical imaging contexts.
Deep learning application to medical imaging: Perspectives as a physicianHongyoon Choi
Deep learning can be applied to medical imaging to directly extract biomarkers from images or enhance existing biomarkers. It can provide prognostic information beyond diagnosis, such as predicting survival outcomes. Challenges include obtaining sufficient labeled data, handling imbalanced or unlabeled data, and estimating certainty in deep learning decisions. Future work aims to address these issues and define normal populations to identify abnormal data.
IRJET - Detection and Classification of Brain TumorIRJET Journal
This document presents a novel method for classifying brain MRI images as normal or abnormal using tumor detection. The method first uses wavelet transforms to extract features from images. It then applies principal component analysis to reduce the feature dimensions. The reduced features are input to a kernel support vector machine for classification. A k-fold cross validation strategy is used to enhance the generalization of the support vector machine model. The proposed system takes MRI brain images as input, detects any tumors by highlighting the affected area, and specifies tumor characteristics like dimensions and type (benign or malignant).
Detecting malaria using a deep convolutional neural networkYusuf Brima
Experiment with Deep Residual Convolutional Neural Network to classify microscopic blood cell images (Uninfected, Parasitized)
Utiling ResNet,Deep Residual Learning for Image Recognition (He et al, 2015) architecture.
Uses Keras with a Tensorflow backend.
Optimizing Problem of Brain Tumor Detection using Image ProcessingIRJET Journal
This document summarizes several existing methods for detecting brain tumors using magnetic resonance imaging (MRI). It discusses techniques such as image preprocessing, segmentation, feature extraction, and classification methods. Specifically, it reviews 10 different papers that propose various approaches for brain tumor detection, segmentation, and classification. These include using k-means clustering, fuzzy c-means, probabilistic neural networks, support vector machines, genetic algorithms, and sparse representation classification. The goal is to evaluate and compare different existing methods for automated brain tumor detection and analysis using MRI images.
http://imatge-upc.github.io/telecombcn-2016-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of big annotated data and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which had been addressed until now with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or text captioning.
International Journal of Biometrics and Bioinformatics(IJBB) Volume (3) Issue...CSCJournals
This document summarizes a research paper that proposes a new crossover operator called Sequential Constructive Crossover (SCX) for solving the Traveling Salesman Problem (TSP) using a genetic algorithm. SCX constructs offspring from parent chromosomes by selecting better edges present in the parents while maintaining the node sequence. The performance of SCX is compared to other crossover operators like Edge Recombination Crossover and Generalized N-point Crossover on benchmark TSP instances, and experimental results show that SCX finds higher quality solutions than the other operators. The TSP is an NP-complete problem where the goal is to find the shortest route to visit all cities on a tour and return to the starting city. Genetic algorithms are
Skin Cancer Detection using Image Processing in Real Timeijtsrd
Machine learning is a fascinating topic its astonishing how a small change in the evaluation values may result in an unfathomable number of outcomes. The goal of this study is to develop a model that uses image processing to identify skin cancer. We will later use the model in real life through an android application. Sunami Dasgupta | Soham Das | Sayani Hazra Pal "Skin Cancer Detection using Image-Processing in Real-Time" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-6 , October 2021, URL: https://www.ijtsrd.com/papers/ijtsrd46384.pdf Paper URL : https://www.ijtsrd.com/computer-science/artificial-intelligence/46384/skin-cancer-detection-using-imageprocessing-in-realtime/sunami-dasgupta
지난주말에 있었던 제 4회 대한신경집중치료학회 편집위원회 워크샵에서 발표했던 내용중에 발췌한 것입니다. 원래 제목은 "인공지능 관련 연구: 논문 작성과 심사에 관한 요령" 입니다. 최근에 deep learning in medical imaging으로 2편의 리뷰와 논문 1편, CADD 논문, 앙상블 논문 1편이 되면서 요청이 온것 같습니다.부족한 제가 하기 어려운 주제를 맡았는데, 혹시 도움이 되실 분이 있으면 도움을 되시라고 올려드립니다. 결론은 인공지능 연구라고 특별히 다르지는 않지만, 공학 연구와 의학연구가 다르고, 인공지능 특성을 잘 이해해야 한다 정도 될것 같습니다. (상당부분 저희병원 박성호 교수님의 radiology 논문 Methodology for Evaluation of Clinical Performance and Impact of Artificial Intelligence Technology for Medical Diagnosis and Prediction을 참고했습니다.)
Introduction to Machine Learning and Texture Analysis for Lesion Characteriza...Kevin Mader
Review the basic principles of machine learning.
Learn what texture analysis is and how to apply it to medical imaging.
Understand how to combine texture analysis and machine learning for lesion classification tasks.
Learn the how to visualize and analyze results.
Understand how to avoid common mistakes like overfitting and incorrect model selection.
A Wavelet Based Automatic Segmentation of Brain Tumor in CT Images Using Opti...CSCJournals
This document summarizes a research paper that proposes a new method for automatically segmenting brain tumors in CT images. The method uses a combination of wavelet-based texture features extracted from discrete wavelet transformed sub-bands. These features are optimized using genetic algorithms and used to train probabilistic neural network and feedforward neural network classifiers to segment tumors. The proposed method is evaluated on brain CT images and shown to outperform existing segmentation methods.
Comparing prediction accuracy for machine learning andAlexander Decker
This document compares the predictive accuracy of various machine learning and statistical classification algorithms on four gene expression datasets. It finds that KNN, RDA, and SVM with a linear kernel generally have lower misclassification rates than DLDA when classifying tumors using different numbers of selected genes (50, 200, 500). Classification performance is evaluated using hold-out cross-validation, and the algorithms are tested on leukemia, lymphoma, SRBCT, and prostate cancer gene expression datasets containing between 38 and 102 samples each.
IRJET - Lung Disease Prediction using Image Processing and CNN AlgorithmIRJET Journal
This document summarizes a research paper that proposes a method for predicting lung disease using image processing and convolutional neural networks (CNNs). The method involves preprocessing chest x-ray images through steps like lung field segmentation, feature extraction, and then classifying the images as normal or abnormal using neural networks and support vector machines (SVMs). The researchers tested their approach on two datasets and were able to classify digital chest x-ray images into normal and abnormal categories with high accuracy. The goal of the research is to develop an automated system for early detection of lung cancer using chest x-rays, as early detection is key to better treatment outcomes.
The classification of different types of tumors is of great importance in cancer diagnosis and its drug discovery. Cancer classification via gene expression data is known to contain the keys for solving the fundamental problems relating to the diagnosis of cancer. The recent advent of DNA microarray technology has made rapid monitoring of thousands of gene expressions possible. With this large quantity of gene expression data, scientists have started to explore the opportunities of classification of cancer using a gene expression dataset. To gain a profound understanding of the classification of cancer, it is necessary to take a closer look at the problem, the proposed solutions, and the related issues altogether. In this research thesis, I present a new way for Leukemia classification using the latest AI technique of Deep learning using Google TensorFlow on gene expression data.
Review of Image Watermarking Technique for MediIJARIIT
In this article, we focus on the complementary role of watermarking with respect to medical information security (integrity, authenticity …) and management. We review sample cases where watermarking has been deployed. We conclude that watermarking has found a niche role in healthcare systems, as an instrument for protection of medical information, for secure sharing and handling of medical images. The concern of medical experts on the preservation of documents diagnostic integrity remains paramount. Medical image watermarking is an appropriate method used for enhancing security and authentication of medical data, which is crucial and used for further diagnosis and reference. This paper discusses the available medical image watermarking methods for protecting and authenticating medical data. The paper focuses on algorithms for application of watermarking technique on Region of Non Interest (RONI) of the medical image preserving Region of Interest (ROI).
Medical Image Processing in Nuclear Medicine and Bone ArthroplastyIOSR Journals
This document discusses medical image processing in nuclear medicine and bone arthroplasty. It provides background on nuclear medicine imaging techniques like planar imaging, SPECT, PET and hybrid SPECT/CT and PET/CT systems. It then discusses how MATLAB can be used for medical image processing tasks in nuclear medicine like organ contouring, interpolation, filtering, segmentation, background removal, registration and volume quantification. Specific examples of nuclear medicine examinations that can be analyzed using MATLAB algorithms are also mentioned.
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...IJECEIAES
In many diseases classification an accurate gene analysis is needed, for which selection of most informative genes is very important and it require a technique of decision in complex context of ambiguity. The traditional methods include for selecting most significant gene includes some of the statistical analysis namely 2-Sample-T-test (2STT), Entropy, Signal to Noise Ratio (SNR). This paper evaluates gene selection and classification on the basis of accurate gene selection using structured complex decision technique (SCDT) and classifies it using fuzzy cluster based nearest neighborclassifier (FC-NNC). The effectiveness of the proposed SCDT and FC-NNC is evaluated for leave one out cross validation metric(LOOCV) along with sensitivity, specificity, precision and F1-score with four different classifiers namely 1) Radial Basis Function (RBF), 2) Multi-layer perception(MLP), 3) Feed Forward(FF) and 4) Support vector machine(SVM) for three different datasets of DLBCL, Leukemia and Prostate tumor. The proposed SCDT &FC-NNC exhibits superior result for being considered more accurate decision mechanism.
Automatic Diagnosis of Abnormal Tumor Region from Brain Computed Tomography I...ijcseit
The research work presented in this paper is to achieve the tissue classification and automatically
diagnosis the abnormal tumor region present in Computed Tomography (CT) images using the wavelet
based statistical texture analysis method. Comparative studies of texture analysis method are performed
for the proposed wavelet based texture analysis method and Spatial Gray Level Dependence Method
(SGLDM). Our proposed system consists of four phases i) Discrete Wavelet Decomposition (ii)
Feature extraction (iii) Feature selection (iv) Analysis of extracted texture features by classifier. A
wavelet based statistical texture feature set is derived from normal and tumor regions. Genetic Algorithm
(GA) is used to select the optimal texture features from the set of extracted texture features. We construct
the Support Vector Machine (SVM) based classifier and evaluate the performance of classifier by
comparing the classification results of the SVM based classifier with the Back Propagation Neural network
classifier(BPN). The results of Support Vector Machine (SVM), BPN classifiers for the texture analysis
methods are evaluated using Receiver Operating Characteristic (ROC) analysis. Experimental results
show that the classification accuracy of SVM is 96% for 10 fold cross validation method. The system
has been tested with a number of real Computed Tomography brain images and has achieved satisfactory
results.
Exploiting biomedical literature to mine out a large multimodal dataset of rare cancer studies. Presentation of Anjani K. Dhrangadhariya (Institute of Information Systems, HES-SO Valais-Wallis, Sierre) at SPIE Medical Imaging 2020.
What are the Responsibilities of a Product Manager by Google PMProduct School
Main takeaways:
-Why Product Managers are critical for research organizations
-Find out what a Product Manager at DeepMind does
-Product Management at the complex intersection of AI and healthcare
This document describes using convolutional neural networks (CNNs) and residual neural networks (ResNets) to detect COVID-19 from chest X-rays and CT scans. CNNs were found to achieve 80% accuracy on X-rays and 72% accuracy on CT scans, outperforming ResNets which achieved around 50% accuracy. The document introduces CNNs and ResNets, describes their architectures and how they were applied to classify medical images as COVID-19 positive or normal. Evaluation metrics showed CNNs were better able to precisely and recall COVID-19 cases compared to ResNets. In conclusion, CNNs were determined to be the better algorithm for this medical image classification task.
The document discusses computer assisted screening of microcalcifications in digitized mammograms for early detection of breast cancer. It begins with an introduction to breast cancer and computer aided detection and diagnosis systems. It then provides background on areas of interest including improvement of pictorial information and machine vision. Next, it discusses microcalcifications, mammography, and mammograms. The document reviews literature on various preprocessing, feature extraction, and detection techniques. It identifies challenges in microcalcification detection including their small size and variable clusters. Finally, it outlines the plan of action for the thesis including use of the mini-MIAS mammogram database and a range of techniques to remove pectoral muscle and x-ray labels.
This document summarizes a presentation on identifying microcalcifications in digital mammograms for early detection of breast cancer. It provides background on breast cancer and microcalcifications, outlines steps in computerized breast cancer detection systems including detection and diagnosis, and reviews literature on using techniques like wavelet and contourlet transforms to enhance mammograms and identify microcalcifications for improved cancer screening. The presentation will focus on microcalcification detection and diagnosis using a contourlet transform approach to enhance mammograms by applying directional filters to contourlet subbands before reconstructing an approximation of the mammogram with enhanced microcalcifications.
[Review] High-performance medicine: the convergence of human and artificial i...Dongmin Choi
The document discusses the convergence of human and artificial intelligence in medicine. It outlines two major trends: 1) rising healthcare expenditures with no productivity growth, and 2) the generation of massive amounts of medical data that exceeds human abilities to analyze. While the integration of human and AI has barely begun, AI has the potential to solve problems in healthcare like diagnostic errors. The paper aims to summarize existing evidence for using AI in various medical fields like radiology, pathology, dermatology, ophthalmology, and cardiology. It provides examples of studies applying AI to tasks like detecting diseases in medical images and reports performance that matches or exceeds human experts. However, limitations and challenges for clinical adoption are also noted.
Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG
1. Machine learning faces challenges in biomedical research due to data heterogeneity, lack of labeled data, and complexity in biological patterns and networks.
2. Combining machine learning and biological network models can help address these challenges by encoding data in biologically meaningful networks and extracting network-based features for prediction.
3. Examples applying this approach to cancer datasets showed that models based on network centrality features outperformed other methods, and deep learning using these features achieved the best prediction performance across multiple neuroblastoma datasets.
This document presents an overview of a thesis project on computer-assisted screening of microcalcifications in digitized mammograms for early detection of breast cancer. The project aims to develop a system that can automatically detect microcalcifications in mammogram images to assist radiologists. The system will use techniques like image segmentation, morphological operations, filtering, and feature extraction to preprocess mammogram images and identify microcalcification clusters. A mini-MIAS database containing 322 mammogram images will be used to test and evaluate the methodology. The document outlines the background, motivation, challenges, plan of action and materials/tools for the project.
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' PerspectivesNamkug Kim
The document discusses Namkug Kim's background and research experience in medical imaging and artificial intelligence, including his work with various companies and research grants focusing on developing AI and robotic technologies for medical applications. It also provides an overview of his collaborations, conflicts of interests, and selected publications.
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...David Peyruc
This document provides an overview of the TraIT project and existing demonstrators using tranSMART. It discusses the TraIT roadmap and user stories being implemented at the Netherlands Cancer Institute. Key points include:
- TraIT aims to support translational research through integrated data and tools across clinical, imaging, biobanking and experimental domains.
- Existing demonstrators using tranSMART include DeCoDe (colorectal cancer) and PCMM (prostate cancer).
- The roadmap involves enhancing tranSMART functionality based on user needs and integrating additional data sources.
- At NKI, tranSMART will provide an integrated research data warehouse with clinical and research data from various sources and departments.
The Medical Segmentation Decathlon provides a benchmark for evaluating the generalizability of semantic segmentation algorithms across a variety of anatomical structures and imaging modalities. The Decathlon includes 10 segmentation tasks with over 2,600 unique patient datasets. In Phase 1 of the challenge, participants developed algorithms to segment the structures and submitted results for evaluation. The top performing methods for each task are identified based on Dice scores and boundary accuracy metrics. Phase 2 will involve applying the previously developed algorithms to new datasets without modifications, to further evaluate generalizability.
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...Indian dental academy
The Indian Dental Academy is the Leader in continuing dental education , training dentists in all aspects of dentistry and offering a wide range of dental certified courses in different formats.
Indian dental academy provides dental crown & Bridge,rotary endodontics,fixed orthodontics,
Dental implants courses.for details pls visit www.indiandentalacademy.com ,or call
0091-9248678078
Radiomics and Deep Learning for Lung Cancer ScreeningWookjin Choi
The document summarizes research on using radiomics and deep learning approaches for lung cancer screening. It describes:
1) Using radiomic features like shape, texture, and intensity from lung nodules on CT scans and an SVM-LASSO model to classify nodules with 87.9% sensitivity and 78.2% specificity, outperforming the Lung-RADS system.
2) A deep learning model developed for a Kaggle competition that achieved 67.4% accuracy on nodule classification but only ranked 99th due to overfitting issues without enough data.
3) Future work could integrate quantification of nodule characteristics like spiculation with plasma biomarkers to improve diagnostic accuracy.
This document summarizes Kevin McGuinness' presentation on deep learning for computer vision. It discusses visual attention models and their ability to predict eye gaze, applications in image cropping, retrieval and classification. It also covers medical image analysis using deep learning for knee osteoarthritis grading and neonatal brain segmentation. Deep crowd analysis is examined for crowd counting. Finally, interactive deep vision for image segmentation using user interactions is presented.
1) Quantitative medicine uses large amounts of medical data and advanced analytics to determine the most effective treatment for individual patients based on their specific clinical profile and biomarkers. This approach can help reduce healthcare costs and improve outcomes compared to the traditional one-size-fits-all model.
2) However, realizing the promise of quantitative personalized medicine is challenging due to the huge quantities of diverse medical data located in dispersed systems, lack of computing capabilities, and barriers to data sharing.
3) Grid and service-oriented computing approaches are helping to address these challenges by enabling federated querying, analysis, and sharing of medical data and services across organizations through virtual integration rather than true consolidation.
[Explained] "Partial Success in Closing the Gap between Human and Machine Vis...Sou Yoshihara
The document summarizes research comparing human and machine vision across various models and datasets. Three key findings are presented: 1) The robustness gap between humans and CNNs is decreasing as newer models match or exceed human performance on most datasets. 2) However, an image-level consistency gap remains, where humans make different errors than models. 3) For many cases, human-model consistency improves when models are trained on datasets an order of magnitude larger. The research aims to benchmark progress in closing these gaps.
Recent advances in diagnosis and treatment planning1 /certified fixed orthod...Indian dental academy
The Indian Dental Academy is the Leader in continuing dental education , training dentists in all aspects of dentistry and offering a wide range of dental certified courses in different formats.
Indian dental academy provides dental crown & Bridge,rotary endodontics,fixed orthodontics,
Dental implants courses.for details pls visit www.indiandentalacademy.com ,or call
0091-9248678078
University of Toronto - Radiomics for Oncology - 2017Andre Dekker
This document contains the slides from a lecture on radiomics for oncology given by Andre Dekker. The lecture covers the rationale for radiomics, which is to use quantitative features extracted from medical images to help predict outcomes like tumor behavior, survival, and response to treatment using machine learning. The major workflow steps of radiomics are discussed, from image acquisition and feature extraction to modeling and validation. Key challenges like robust segmentation and feature reproducibility are also addressed. New directions for radiomics research include applications in preclinical studies, other modalities like PET and MRI, and linking radiomic features to genomic data. Overall, radiomics holds promise to help personalized medicine but large amounts of standardized data are still needed for proper validation of models.
Large amounts of heterogeneous medical data have become available in various healthcare organizations (payers, providers, pharmaceuticals). Those data could be an enabling resource for deriving insights for improving care delivery and reducing waste. The enormity and complexity of these datasets present great challenges in analyses and subsequent applications to a practical clinical environment. More details are available here http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
Similar a Exploiting biomedical literature to mine out a large multimodal dataset of rare cancers (20)
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"sameer shah
Embark on a captivating financial journey with 'Financial Odyssey,' our hackathon project. Delve deep into the past performance of two companies as we employ an array of financial statement analysis techniques. From ratio analysis to trend analysis, uncover insights crucial for informed decision-making in the dynamic world of finance."
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
Exploiting biomedical literature to mine out a large multimodal dataset of rare cancers
1. Exploiting biomedical literature to mine out a large
multimodal dataset of rare cancer studies
Anjani K. Dhrangadhariya et al.
MedGIFT group
University of Applied Sciences Western Switzerland (HES-SO)
Project supported by European Union
Horizon 2020 grant agreement 825292
SPIE Medical Imaging 2020, 16.02.2020
2. Motivation
> Rare cancers = 15 out of 100,000 / year
> Account for 25% cancer-related deaths
> Lower prevalence = fewer patients
> Less tumor samples for research
> Lack of robust clinical models
Puca, Loredana, et al. "Patient derived organoids to model rare prostate cancer phenotypes." Nature communications 9.1 (2018): 1-10.
2
3. Data resource
• Challenges
1) Private datasets
2) Limited size
3) Single center / scanner
4) Small variability
5) Some contain only images / only text
6) No or small subsets of manual annotations
7) Difficult to compare results
3
6. Medical Subject Headings (MeSH)
• Hierarchically organized
Controlled Vocabulary
• Cataloguing biomedical
information
• 16 thematic categories
• A = Anatomy
• B = Organism…
• Each term has a unique
MeSH Identifier
MeSH
term
MeSH
code
Lipscomb, Carolyn E. "Medical subject headings (MeSH)." Bulletin of the Medical Library Association 88.3 (2000): 265.
6
7. MeSH as annotation
• Manually annotated by National library of
Medicine (NLM) staff
• For e.g., All the studies about
benign cancer are indexed
under MeSH annotation “Neoplasm”
• Groundtruth annotation
• Not all PMC / PMCOA have annotations
7
8. Visual classification
• ImageCLEF medical image annotation
challenge (since 2013)
• Small subset of annotated PMC-OA >
train CNNs
• Classify into 31 modalities - PET, light
microscopy, CT, etc.
• State of the art: Superficial modality
classification
8
Deep Multimodal Classification of Image Types in Biomedical Journal Figures”, Andrearczyk and Müller, CLEF 2018
2000 Annotated PMC-OA
90% accuracy
13. Title +
Abstract
Title +
Abstract
Textual approach
Title +
Abstract
Model
training &
evaluation
Best
performing
model
13
MeSH_0
MeSH_1
Title +
Abstract
MeSH_0
MeSH_1
Title +
Abstract
No MeSH
14. 14
Pipeline
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
Title +
Abstract
MeSH MeSH
vs
15. - 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
Getting “human” images
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
human
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
Not human
20%
80%
Training set
Test set
human
Not human
Title +
Abstract
Title +
Abstract
=
= ⇔ B01.050.150.900.649.313.988.400.112.400.400 ∉ {MeSH}
⇔ B01.050.150.900.649.313.988.400.112.400.400 ∈ {MeSH} & other B01 codes ∉ {MeSH}
15
16. Getting “human” images
Title +
Abstract
Title +
Abstract
human
not human
Best performing
Model, hyper-params and
vectors
SVM, tf-idf bigrams
No MeSH
Title +
Abstract
DLMI
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
human
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
not human
20%
80%
Training set
Test set
- 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
16
17. 17
Pipeline
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
Title +
Abstract
MeSH MeSH
vs
18. 18
Getting “neoplastic” images
neoplastic
not neoplastic
Title +
Abstract
Title +
Abstract
=
= ⇔ C04 ∉ {MeSH}
⇔ C04 ∈ {MeSH}
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
20%
80%
Training set
Test set
human
neoplastic
not neoplastic
- 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
19. Getting “non-neoplastic” images
Title +
Abstract
Title +
Abstract
Title +
Abstract
{MeSH}
DLMI
Model training and evaluation
1. Logistic regression
2. Support Vector Machine
3. K-nearest neighbor
1. Tf-idf,
2. Word vectors,
3. paragraph vector
20%
80%
Training set
Test set
human
neoplastic
not neoplastic
Title +
Abstract
Title +
Abstract
Best performing
Model, hyper-params and
vectors
SVM, tf-idf bigrams
No MeSH
Title +
Abstract
DLMI
human
neoplastic
not neoplastic
- 0.5467
0.1111
0.5789
- 0.3789
- 0.4999
0.6687
- 0.1167
0.9976
19
20. 20
Pipeline
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all images
1
2
3
4
5
Title +
Abstract
MeSH MeSH
vs
21. Getting “rare cancer” images
• No MeSH terms for “rare” cancer class
• Set of {rare cancer} terms by National Center for Advancing
Translational Sciences (NCATS)
https://rarediseases.info.nih.gov/diseases/diseases-by-category/1
21
Title +
Abstract
Title +
Abstract
DLMI
humanNo MeSH
{MeSH}
DLMI
neoplastic
human
neoplastic
Title +
Abstract
rare
cancer
Title +
Abstract
rare
cancer
= ⇔
Title +
Abstract ∩ {rare cancer} ≠
Ø Title +
Abstract
non-rare
cancer
22. Visual: “rare cancer”
22
rare cancer
Model training and evaluation
• VGG19
• ImageNet weights
• With and without image
augmentation
non-rare cancer
23. Visual: “rare cancer”
23
No label
Model training and evaluation
• VGG19
• ImageNet weights
• With and without image
augmentation
rare cancer
non-rare cancer
rare cancer non-rare cancer
24. Results
“human” vs. “non-human” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.69 0.71 0.68
Textual SVM Tf-idf trigrams 0.89 0.90 0.90
24
25. Results
“human” vs. “non-human” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.69 0.71 0.68
Textual SVM Tf-idf trigrams 0.89 0.90 0.90
“neoplastic” vs. “non-neoplastic” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.68 0.65 0.64
Textual SVM Tf-idf bigrams 0.99 0.99 0.99
25
26. Results
“human” vs. “non-human” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.69 0.71 0.68
Textual SVM Tf-idf trigrams 0.89 0.90 0.90
“neoplastic” vs. “non-neoplastic” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.68 0.65 0.64
Textual SVM Tf-idf bigrams 0.99 0.99 0.99
“rare cancer” vs. “non-rare cancer” classification
Data type Classifier Feature Precision Recall F1-score
Visual VGG19 With data augmentation 0.62 0.77 0.69
26
27. Discussion: Textual vs. Visual
27
Textual approach
Outperformed visual approach
for all tasks
Tf-idf n-grams with SVM
performed the excellent for
both tasks.
Visual approach
Correctly classify some
“human” test instances with
recall of 0.71
Worse performance for
“neoplastic” identification
“rare cancer” classification had
a recall of 0.77
28. Conclusion
• First study targeting automatic rare cancer
image extraction
• Used approach relies on visual deep
learning and textual NLP
• 15,028 light microscopy (DLMI), human,
rare cancer images + corresponding journal
articles
Getting DLMI images
Getting “human” images
Getting “neoplastic” images
Getting “rare cancer” images
PMC-OA all data
28
1
2
3
4
5
29. Thank you for your attention
29
More information:
http://medgift.hevs.ch
Contact:
anjani.dhrangadhariya@hevs.ch
Follow us:
https://twitter.com/MedGIFT_group
Notas del editor
2
3
4
How are these biomedical publications stored in Medline represented in PubMed?
A PubMed record consists of Title and Abstract followed by Publication images as shown in thumbnails.
And a list of Medical Subject Headings or MeSH annotations that are like keywords or annotations describing something about the publication.
All these text, images and MeSH terms are stringed together by the unique PubMed Identifier or PMID.
You can also notice a PMCID or unique pubmed central identifier that links to the full-text of the publication.
All these components, the images, text and the MeSH terms have thus 1 to 1 association with each other.
6
PubMed records are manually annotated with MeSH terms by staff at NLM.
What is the significance of attaching MeSH terms to a PubMed record?
MeSH annotation enforces uniformity and consistency across the terminology in a way that all articles about benign cancer are indexed under MeSH term “Neoplasm”, all the articles or studies involving patients are annotated under MeSH term “Humans”
So MeSH terms could be considered as gold standard annotations or groundtruth annotations for a publication.
Not all publications in PubMed have these manually attached MeSH terms.
Have this PMC-OA images been used elsewhere for image analysis?
Yes, an annotated subset of PMC-OA has already been used in ImageCLEF medical image annotation challenge which is a public challenge that has been taking place since 2013.
This small annotated subset of 2000 images was used to train CNNs for image classification into 31 image modality classes…
Including PET, CT images, light microscopy images, et cetera.
This classification approach achieved an overall 90% accuracy for modality classification.
However, this approach only goes till superficial modality classification task.
What about going beyond this generic modality classification into more specialized image sets?
So what we did for navigating towards rare cancer sets was this:
Take all the PMC-OA images and classify them using ImageCLEF setup into 31 modality types.
Retain all the images classified as DLMI or diagnostic light microscopy images.
We focus only upon DLMI images because they are fundamental to rare cancer diagnostics.
All the retained DLMI images are linked to their respective title, abstract and MeSH annotations if available.
With this multimodal annotated dataset in hand, we propose an approach for sequential curation of article abstracts and images using MeSH terms to eventually mine-out a large multimodal set of rare cancer images and full-texts.
This involves three subsequent binary classification tasks where we first filter “human” from “non-human” set, followed by separating “neoplastic” from “non-neoplastic” set and finally separating “rare cancer“ from the “non-rare cancer“.
It has to be noticed that at each binary classification step we compare visual vs. textual approach separately and use MeSH terms as the groundtruth labels for the datasets.
For the visual classification tasks, images with two different MeSH classes were used to and evaluate VGG19 model using pretrained trained ImageNet weights and fine-tuned with and without image augmentation
Data augmentation: image mirroring and cropping.
Why do we use VGG?
This fined-tuned models were then used to classify unlabeled images into their respective classes.
13
Lets get back to the pipeline for further curating the previously retrieved DLMI dataset.
«human» records were first filtered out from «non-human records» in following way.
15
Best performing model setup was used to classify the un-annotated DLMI records into “human” and “non-human”.
Then «neoplastic» or tumor-related records were separated from «non-neoplastic» records in similar manner.
18
Best performing model setup was used to classify the un-annotated records into “neoplasm” and “non-neoplasm”.
This was about the annotated text dataset. Similarly, the annotated image dataset classified using VGG19 setup.
Finally, we chaff out rare cancer dataset from the non-rare cancer dataset.
Unfortunately, there are no MeSH terms pertaining to “rare cancer”, so we used a pre-defined set of rare cancer terms available from NCATS.
All the records recognized as “neoplasm” were retained and filtered out as “rare cancer” only if rare cancer term from NCATS set was present in the title and the abstract.
After getting «rare cancer» and the «non-rare cancer» labels for images from the previous text classification, we used them to train and evaluate a VGG19 model for this binary classification task.
After getting «rare cancer» and the «non-rare cancer» labels for images from the previous text classification, we used them to train and evaluate a VGG19 model for this binary classification task.
For the «human» classification task, textual approach performed far better than visual approach.
However, a recall of 0.71 hints that the visual classification model does learn something about retaining human images.
For the neoplasm classification task too, textual performed better than visual.
Visual approach did not have good results for this task.
For the final task, a recall of 0.77 does hint that VGG19 model did learn something by better retaining the «rare cancer» images, but it has much room for improvement.