SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
Is Natural Language Processing Good for your Health?
25th April 2018
Nigel Collier
Theoretical and Applied Linguistics, MML
Some Preliminaries
• Knowledge graph:
• A type of large-scale semantic network describing concepts and their logical
relationships, e.g. WordNet, BabelNet, Yago, SNOMED CT
• Named Entity
• A sequence of words that denote a term or individual
Entity linking (‘grounding’, ‘coding’)
• To establish the specific reference of a named entity according to an ontology
• Distributional semantics
• A system of representing meaning whereby words/phrases are understood as points in
a low dimensional geometric space. Semantics is encoded as a configuration pattern
over all dimensions. e.g. word2vec embeddings (Mikolov et a. 2013) (Firth 1957)
• Deep neural networks (‘deep learning’)
• A type of artificial neural network with multiple hidden layers of units
Data Science in Health Requires the Combined Expertise of…
Biologists
Clinicians
Mathematicians
Statisticians
Computer
ScientistsGeneticists
Bioinformaticians
Computational
Linguists…?
Computational Models of Language are Key to Making Sense
of Health Data
• Western clinical notes date back to the 5th/6th
centuries bc
• Today 60 to 70% of NHS data exists only as
unstructured text
• Biomedical literature, Clinical trials data, Lab
notebooks, Clinical records, Diagnostic reports,
News reports on disease outbreaks, Social
media messages, Patient interviews, Patient
forum data …
• Represents the most contextually grounded,
high precision information about an individual’s
health, attitudes and behaviours
Case Studies: Health and NLP
• Infectious disease monitoring [4,5]
• Drug safety analysis [6,7]
• Diagnosis of semantic dementia [8]
• Monitoring air quality [9]
[4] Aramaki, E. et al. (2011). Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language
processing (pp. 1568-1576). Association for Computational Linguistics. [5] Collier, N., Son, N. T., & Nguyen, N. M. (2011). OMG U got flu? Analysis of shared health messages
for bio-surveillance. Journal of biomedical semantics, 2(5), S9. [6] Sarker, A. et al. (2015). Utilizing social media data for pharmacovigilance: A review. Journal of biomedical
informatics, 54, 202-212. [7] Yang, C. C., et al. (2014). Postmarketing drug safety surveillance using publicly available health-consumer-contributed content in social media. ACM
Transactions on Management Information Systems (TMIS), 5(1), 2 [8] Pakhomov, S. V., Smith, G. E., Marino, S., Birnbaum, A., Graff-Radford, N., Caselli, R., ... & Knopman, D.
S. (2010). A computerized technique to assess language use patterns in patients with frontotemporal dementia. Journal of neurolinguistics, 23(2), 127-144. [9] Wang, S., Paul, M.
J., & Dredze, M. (2015). Social media as a sensor of air quality and public response in china. Journal of medical Internet research, 17(3), e22. [10] Nakhasi, A., Passarella, R.,
Bell, S. G., Paul, M. J., Dredze, M., & Pronovost, P. (2012, October). Malpractice and malcontent: Analyzing medical complaints in twitter. In 2012 AAAI Fall Symposium Series.
[11] Weber, I., & Achananuparp, P. (2015). Insights from machine-learned diet success prediction. arXiv preprint arXiv:1510.04802. [12] Dos Reis, V. L., & Culotta, A. (2015,).
Using matched samples to estimate the effects of exercise on mental health from Twitter. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 182-
188). [13] Patel, R. et al. (2015). Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method. BMJ open, 5(9), e007619.
• Analysing malpractice [10]
• Diet success [11]
• Exercise and mental health [12]
• Identifying symptoms of
schizophrenia [13]
Combining Language Models for Information Extraction
raw text
document
Sentence
segmentation
Tokenization
Lexical
featurisation
Entity
recognition
Trigger
detection
Relation
extraction
Event
extraction
Entity
linking
knowledge objects
Syntactic
parsing
Entity Linking: a Central Task in Information Extraction
Textual evidence for ‘JFK International’
“JetBlue begins direct service between Barnstable
Airport and JFK International” [14]
[14] Ling, X., Singh, S., & Weld, D. S. (2015). Design challenges for entity linking. Transactions of the
Association for Computational Linguistics, 3, 315-328.
Wikipedia entry for ‘JFK International’
Illustrating the Complexities of Entity Linking in Health
Source Entity Mention Target Concept (SNOMED) Current
Data Driven
Solution?
Twitter hungry hunger y
Twitter gained 2kgs in
weight
weight gain y
Twitter head spinning dizziness y
Twitter rupturd his bowel gastrointestinal perforation y
EHR No pneumothorax history of pneumothorax,
negative
?
EHR right breast cancer breast cancer + right n
EHR A.FIB atrial fibrillation ?
Literature peculiar changes in
the dendrites of
Purjinje cells
abnormal + Purjinje cell +
dendrite + associated
morphology
n
A Brief Overview of Entity Linking
• 1. Manually defined symbolic features – fast, clear but restricted coverage
• String matching on concept labels, e.g. “hungry”  hunger [15]
• 2. Machine translation models using symbolic features – better coverage
• Recognise variant compositions, e.g. “gained 2kgs in weight”  weight gain [16]
• 3. Distributed compositional semantic models – best coverage but opaque
• Recognises latent similarities, e.g. “head spinning”  dizziness
• Dependent to some extent on large-scale data
• Doesn’t account by itself for complex utterances such as post-coordinated concepts.[17]
• Vagueness? “terrible headache this morning”  sinus headache ? Tension headache?
Hangover ?
[15] Zhiyong Lu, et al. The gene normalization task in biocreative iii. BMC bioinformatics, 12(8):S2, 2011. [16] Nut Limsopatham and Nigel Collier. Adapting
phrase-based machine translation to normalise medical terms in social media messages. In Proceedings of the 2015 Conference on Empirical Methods in Natural
Language Processing, pages 1675–1680. Association for Computational Linguistics, 2015. [17] Ferdinand Dhombres and Olivier Bodenreider. Interoperability
between phenotypes in research and healthcare terminologies—investigating partial mappings between hpo and snomed ct. J. Biomedical semantics, 7(1):3, 2016.
Experimental Setup: Language Models
10
Model Description Ref.
TF-IDF Traditional statistical term
matching approach
[18]
BM25 Traditional term ranking function [19]
SVM LTR Supervised machine learning
model (current SOTA)
[20]
DWR Cosine similarity between word
vectors for mention and concept
SMT + DWR Statistical word alignment model [21]
CNN Supervised neural network model [22]
[18] Spärck Jones, K. (2004). IDF term weighting and IR research lessons. Journal of documentation, 60(5), 521-523. [19] Robertson, S., Zaragoza, H., & Taylor,
M. (2004, November). Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on Information and
knowledge management (pp. 42-49). ACM. [20] Leaman, R., Islamaj Doğan, R., & Lu, Z. (2013). DNorm: disease name normalization with pairwise learning to
rank. Bioinformatics, 29(22), 2909-2917. [21] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., ... & Dyer, C. (2007, June). Moses:
Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp.
177-180). Association for Computational Linguistics. [22] Limsopatham, N., & Collier, N. H. (2016). Normalising medical concepts in social media texts by learning
semantic representation.
Experimental Setup: Data Sets
• We evaluate our approaches on three different datasets
11
Dataset # Queries
# Target
concepts
Data source
TwADR-S 201 58 Twitter Messages
TwADR-L 2,220 1,436 Twitter Messages
AskPatient 8,662 1,036 Blog posts from askapatient.com
Experimental Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
TwADR-S TwADR-L AskPatient
TF-IDF BM25 DWR SVM LTR SMT+DWR CNN
Accuracy
12
Conclusion
• NLP can contribute to medical discovery through the intelligent use of existing data. This
shortens the time to insight.
• I’ve proposed and tested various ways of linking entities to knowledge bases using
machine learning
• I’ve used distributed semantic representations to infer latent similarities between
mentions and concepts
• Future research:
• Generating explanations
• Handling compositionality
• Scaling up to larger knowledge bases
• Get involved: Data challenge?  CLEF eHealth Lab + BioCreative Connect? 
ESPRC Healtext Network + Alan Turing Institute Publish? e.g. ACL + EMNLP +
BioNLP + SocialNLP Software? Apache cTakes + GATE
Thank you!
Slides available at: https://www.slideshare.net/nigel_collier
https://sites.google.com/site/nhcollier/
nhc30@cam.ac.uk
ORCID: 0000-0002-7230-4164
Twitter: @nigelhcollier

Más contenido relacionado

La actualidad más candente

ARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDS
ARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDSARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDS
ARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDSIJCSES Journal
 
Current issues - International Journal of Computer Science and Engineering Su...
Current issues - International Journal of Computer Science and Engineering Su...Current issues - International Journal of Computer Science and Engineering Su...
Current issues - International Journal of Computer Science and Engineering Su...IJCSES Journal
 
Depression Detection in Tweets using Logistic Regression Model
Depression Detection in Tweets using Logistic Regression ModelDepression Detection in Tweets using Logistic Regression Model
Depression Detection in Tweets using Logistic Regression Modelijtsrd
 
Multi-objective NSGA-II based community detection using dynamical evolution s...
Multi-objective NSGA-II based community detection using dynamical evolution s...Multi-objective NSGA-II based community detection using dynamical evolution s...
Multi-objective NSGA-II based community detection using dynamical evolution s...IJECEIAES
 
Depression and anxiety detection through the Closed-Loop method using DASS-21
Depression and anxiety detection through the Closed-Loop method using DASS-21Depression and anxiety detection through the Closed-Loop method using DASS-21
Depression and anxiety detection through the Closed-Loop method using DASS-21TELKOMNIKA JOURNAL
 
TOP 10 Cited Computer Science & Information Technology Research Articles From...
TOP 10 Cited Computer Science & Information Technology Research Articles From...TOP 10 Cited Computer Science & Information Technology Research Articles From...
TOP 10 Cited Computer Science & Information Technology Research Articles From...AIRCC Publishing Corporation
 

La actualidad más candente (7)

ARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDS
ARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDSARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDS
ARTIFICIAL NEURAL NETWORKS FOR MEDICAL DIAGNOSIS: A REVIEW OF RECENT TRENDS
 
Current issues - International Journal of Computer Science and Engineering Su...
Current issues - International Journal of Computer Science and Engineering Su...Current issues - International Journal of Computer Science and Engineering Su...
Current issues - International Journal of Computer Science and Engineering Su...
 
Depression Detection in Tweets using Logistic Regression Model
Depression Detection in Tweets using Logistic Regression ModelDepression Detection in Tweets using Logistic Regression Model
Depression Detection in Tweets using Logistic Regression Model
 
Dr. Edward Velasco - “Intelligent Use” of Electronic Data to Enhance Public H...
Dr. Edward Velasco - “Intelligent Use” of Electronic Data to Enhance Public H...Dr. Edward Velasco - “Intelligent Use” of Electronic Data to Enhance Public H...
Dr. Edward Velasco - “Intelligent Use” of Electronic Data to Enhance Public H...
 
Multi-objective NSGA-II based community detection using dynamical evolution s...
Multi-objective NSGA-II based community detection using dynamical evolution s...Multi-objective NSGA-II based community detection using dynamical evolution s...
Multi-objective NSGA-II based community detection using dynamical evolution s...
 
Depression and anxiety detection through the Closed-Loop method using DASS-21
Depression and anxiety detection through the Closed-Loop method using DASS-21Depression and anxiety detection through the Closed-Loop method using DASS-21
Depression and anxiety detection through the Closed-Loop method using DASS-21
 
TOP 10 Cited Computer Science & Information Technology Research Articles From...
TOP 10 Cited Computer Science & Information Technology Research Articles From...TOP 10 Cited Computer Science & Information Technology Research Articles From...
TOP 10 Cited Computer Science & Information Technology Research Articles From...
 

Similar a Is NLP Good for Health

Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsAshis Chanda
 
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseaseseSAT Publishing House
 
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseaseseSAT Journals
 
Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Nicola Amoroso
 
BreastScreening: On the Use of Multi-Modality in Medical Imaging Diagnosis
BreastScreening: On the Use of Multi-Modality in Medical Imaging DiagnosisBreastScreening: On the Use of Multi-Modality in Medical Imaging Diagnosis
BreastScreening: On the Use of Multi-Modality in Medical Imaging DiagnosisInstituto Superior Técnico
 
Semantic Similarity Measures between Terms in the Biomedical Domain within f...
 Semantic Similarity Measures between Terms in the Biomedical Domain within f... Semantic Similarity Measures between Terms in the Biomedical Domain within f...
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Recordkingstdio
 
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...Mohammad Shakirul islam
 
Depression prognosis using natural language processing and machine learning ...
Depression prognosis using natural language processing and  machine learning ...Depression prognosis using natural language processing and  machine learning ...
Depression prognosis using natural language processing and machine learning ...IJECEIAES
 
Estimating the Statistical Significance of Classifiers used in the Predictio...
Estimating the Statistical Significance of Classifiers used in the  Predictio...Estimating the Statistical Significance of Classifiers used in the  Predictio...
Estimating the Statistical Significance of Classifiers used in the Predictio...IOSR Journals
 
Evotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh DiscoveryEvotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh DiscoveryNeo4j
 
ShortStory_bioCaster.pptx
ShortStory_bioCaster.pptxShortStory_bioCaster.pptx
ShortStory_bioCaster.pptxDeviPriyaRavi2
 
Top 1 cited paper cybernetics (ijci)
Top 1 cited paper cybernetics (ijci)Top 1 cited paper cybernetics (ijci)
Top 1 cited paper cybernetics (ijci)IJCI JOURNAL
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Maria Karampela
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataSofia Ouhbi
 

Similar a Is NLP Good for Health (20)

Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methods
 
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
 
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
 
Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016Curriculum_Amoroso_EN_28_07_2016
Curriculum_Amoroso_EN_28_07_2016
 
BreastScreening: On the Use of Multi-Modality in Medical Imaging Diagnosis
BreastScreening: On the Use of Multi-Modality in Medical Imaging DiagnosisBreastScreening: On the Use of Multi-Modality in Medical Imaging Diagnosis
BreastScreening: On the Use of Multi-Modality in Medical Imaging Diagnosis
 
Semantic Similarity Measures between Terms in the Biomedical Domain within f...
 Semantic Similarity Measures between Terms in the Biomedical Domain within f... Semantic Similarity Measures between Terms in the Biomedical Domain within f...
Semantic Similarity Measures between Terms in the Biomedical Domain within f...
 
Quality of Life Technologies: From Cure to Care
Quality of Life Technologies: From Cure to CareQuality of Life Technologies: From Cure to Care
Quality of Life Technologies: From Cure to Care
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
I so p 9.10.2017
I so p 9.10.2017I so p 9.10.2017
I so p 9.10.2017
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Recent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health RecordRecent Advances in Deep Learning Techniques for Electronic Health Record
Recent Advances in Deep Learning Techniques for Electronic Health Record
 
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
A Novel Approach for Tomato Diseases Classification Based on Deep Convolution...
 
03 Guerra, Rudy
03 Guerra, Rudy03 Guerra, Rudy
03 Guerra, Rudy
 
Depression prognosis using natural language processing and machine learning ...
Depression prognosis using natural language processing and  machine learning ...Depression prognosis using natural language processing and  machine learning ...
Depression prognosis using natural language processing and machine learning ...
 
Estimating the Statistical Significance of Classifiers used in the Predictio...
Estimating the Statistical Significance of Classifiers used in the  Predictio...Estimating the Statistical Significance of Classifiers used in the  Predictio...
Estimating the Statistical Significance of Classifiers used in the Predictio...
 
Evotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh DiscoveryEvotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh Discovery
 
ShortStory_bioCaster.pptx
ShortStory_bioCaster.pptxShortStory_bioCaster.pptx
ShortStory_bioCaster.pptx
 
Top 1 cited paper cybernetics (ijci)
Top 1 cited paper cybernetics (ijci)Top 1 cited paper cybernetics (ijci)
Top 1 cited paper cybernetics (ijci)
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health Data
 

Último

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Último (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

Is NLP Good for Health

  • 1. Is Natural Language Processing Good for your Health? 25th April 2018 Nigel Collier Theoretical and Applied Linguistics, MML
  • 2. Some Preliminaries • Knowledge graph: • A type of large-scale semantic network describing concepts and their logical relationships, e.g. WordNet, BabelNet, Yago, SNOMED CT • Named Entity • A sequence of words that denote a term or individual Entity linking (‘grounding’, ‘coding’) • To establish the specific reference of a named entity according to an ontology • Distributional semantics • A system of representing meaning whereby words/phrases are understood as points in a low dimensional geometric space. Semantics is encoded as a configuration pattern over all dimensions. e.g. word2vec embeddings (Mikolov et a. 2013) (Firth 1957) • Deep neural networks (‘deep learning’) • A type of artificial neural network with multiple hidden layers of units
  • 3. Data Science in Health Requires the Combined Expertise of… Biologists Clinicians Mathematicians Statisticians Computer ScientistsGeneticists Bioinformaticians Computational Linguists…?
  • 4. Computational Models of Language are Key to Making Sense of Health Data • Western clinical notes date back to the 5th/6th centuries bc • Today 60 to 70% of NHS data exists only as unstructured text • Biomedical literature, Clinical trials data, Lab notebooks, Clinical records, Diagnostic reports, News reports on disease outbreaks, Social media messages, Patient interviews, Patient forum data … • Represents the most contextually grounded, high precision information about an individual’s health, attitudes and behaviours
  • 5. Case Studies: Health and NLP • Infectious disease monitoring [4,5] • Drug safety analysis [6,7] • Diagnosis of semantic dementia [8] • Monitoring air quality [9] [4] Aramaki, E. et al. (2011). Twitter catches the flu: detecting influenza epidemics using Twitter. In Proceedings of the conference on empirical methods in natural language processing (pp. 1568-1576). Association for Computational Linguistics. [5] Collier, N., Son, N. T., & Nguyen, N. M. (2011). OMG U got flu? Analysis of shared health messages for bio-surveillance. Journal of biomedical semantics, 2(5), S9. [6] Sarker, A. et al. (2015). Utilizing social media data for pharmacovigilance: A review. Journal of biomedical informatics, 54, 202-212. [7] Yang, C. C., et al. (2014). Postmarketing drug safety surveillance using publicly available health-consumer-contributed content in social media. ACM Transactions on Management Information Systems (TMIS), 5(1), 2 [8] Pakhomov, S. V., Smith, G. E., Marino, S., Birnbaum, A., Graff-Radford, N., Caselli, R., ... & Knopman, D. S. (2010). A computerized technique to assess language use patterns in patients with frontotemporal dementia. Journal of neurolinguistics, 23(2), 127-144. [9] Wang, S., Paul, M. J., & Dredze, M. (2015). Social media as a sensor of air quality and public response in china. Journal of medical Internet research, 17(3), e22. [10] Nakhasi, A., Passarella, R., Bell, S. G., Paul, M. J., Dredze, M., & Pronovost, P. (2012, October). Malpractice and malcontent: Analyzing medical complaints in twitter. In 2012 AAAI Fall Symposium Series. [11] Weber, I., & Achananuparp, P. (2015). Insights from machine-learned diet success prediction. arXiv preprint arXiv:1510.04802. [12] Dos Reis, V. L., & Culotta, A. (2015,). Using matched samples to estimate the effects of exercise on mental health from Twitter. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (pp. 182- 188). [13] Patel, R. et al. (2015). Negative symptoms in schizophrenia: a study in a large clinical sample of patients using a novel automated method. BMJ open, 5(9), e007619. • Analysing malpractice [10] • Diet success [11] • Exercise and mental health [12] • Identifying symptoms of schizophrenia [13]
  • 6. Combining Language Models for Information Extraction raw text document Sentence segmentation Tokenization Lexical featurisation Entity recognition Trigger detection Relation extraction Event extraction Entity linking knowledge objects Syntactic parsing
  • 7. Entity Linking: a Central Task in Information Extraction Textual evidence for ‘JFK International’ “JetBlue begins direct service between Barnstable Airport and JFK International” [14] [14] Ling, X., Singh, S., & Weld, D. S. (2015). Design challenges for entity linking. Transactions of the Association for Computational Linguistics, 3, 315-328. Wikipedia entry for ‘JFK International’
  • 8. Illustrating the Complexities of Entity Linking in Health Source Entity Mention Target Concept (SNOMED) Current Data Driven Solution? Twitter hungry hunger y Twitter gained 2kgs in weight weight gain y Twitter head spinning dizziness y Twitter rupturd his bowel gastrointestinal perforation y EHR No pneumothorax history of pneumothorax, negative ? EHR right breast cancer breast cancer + right n EHR A.FIB atrial fibrillation ? Literature peculiar changes in the dendrites of Purjinje cells abnormal + Purjinje cell + dendrite + associated morphology n
  • 9. A Brief Overview of Entity Linking • 1. Manually defined symbolic features – fast, clear but restricted coverage • String matching on concept labels, e.g. “hungry”  hunger [15] • 2. Machine translation models using symbolic features – better coverage • Recognise variant compositions, e.g. “gained 2kgs in weight”  weight gain [16] • 3. Distributed compositional semantic models – best coverage but opaque • Recognises latent similarities, e.g. “head spinning”  dizziness • Dependent to some extent on large-scale data • Doesn’t account by itself for complex utterances such as post-coordinated concepts.[17] • Vagueness? “terrible headache this morning”  sinus headache ? Tension headache? Hangover ? [15] Zhiyong Lu, et al. The gene normalization task in biocreative iii. BMC bioinformatics, 12(8):S2, 2011. [16] Nut Limsopatham and Nigel Collier. Adapting phrase-based machine translation to normalise medical terms in social media messages. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1675–1680. Association for Computational Linguistics, 2015. [17] Ferdinand Dhombres and Olivier Bodenreider. Interoperability between phenotypes in research and healthcare terminologies—investigating partial mappings between hpo and snomed ct. J. Biomedical semantics, 7(1):3, 2016.
  • 10. Experimental Setup: Language Models 10 Model Description Ref. TF-IDF Traditional statistical term matching approach [18] BM25 Traditional term ranking function [19] SVM LTR Supervised machine learning model (current SOTA) [20] DWR Cosine similarity between word vectors for mention and concept SMT + DWR Statistical word alignment model [21] CNN Supervised neural network model [22] [18] Spärck Jones, K. (2004). IDF term weighting and IR research lessons. Journal of documentation, 60(5), 521-523. [19] Robertson, S., Zaragoza, H., & Taylor, M. (2004, November). Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on Information and knowledge management (pp. 42-49). ACM. [20] Leaman, R., Islamaj Doğan, R., & Lu, Z. (2013). DNorm: disease name normalization with pairwise learning to rank. Bioinformatics, 29(22), 2909-2917. [21] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., ... & Dyer, C. (2007, June). Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions (pp. 177-180). Association for Computational Linguistics. [22] Limsopatham, N., & Collier, N. H. (2016). Normalising medical concepts in social media texts by learning semantic representation.
  • 11. Experimental Setup: Data Sets • We evaluate our approaches on three different datasets 11 Dataset # Queries # Target concepts Data source TwADR-S 201 58 Twitter Messages TwADR-L 2,220 1,436 Twitter Messages AskPatient 8,662 1,036 Blog posts from askapatient.com
  • 12. Experimental Results 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 TwADR-S TwADR-L AskPatient TF-IDF BM25 DWR SVM LTR SMT+DWR CNN Accuracy 12
  • 13. Conclusion • NLP can contribute to medical discovery through the intelligent use of existing data. This shortens the time to insight. • I’ve proposed and tested various ways of linking entities to knowledge bases using machine learning • I’ve used distributed semantic representations to infer latent similarities between mentions and concepts • Future research: • Generating explanations • Handling compositionality • Scaling up to larger knowledge bases • Get involved: Data challenge?  CLEF eHealth Lab + BioCreative Connect?  ESPRC Healtext Network + Alan Turing Institute Publish? e.g. ACL + EMNLP + BioNLP + SocialNLP Software? Apache cTakes + GATE
  • 14. Thank you! Slides available at: https://www.slideshare.net/nigel_collier https://sites.google.com/site/nhcollier/ nhc30@cam.ac.uk ORCID: 0000-0002-7230-4164 Twitter: @nigelhcollier