SlideShare una empresa de Scribd logo
1 de 16
Understanding medical concepts and codes through
NLP methods
Presenter:
Ashis Kumar Chanda
Ph.D. student, CIS Dept.
Contents
● Introduction
● Related works
● Project 1:
■ Improving Medical concept representations with external knowledge
● Project 2:
■ Jointly learning medical concepts and code representation
● Conclusions and future work
2
Introduction
● What is medical concepts?
● Medical concepts are medical terms, abbreviations or short form words.
■ Ex: heart attack, breast cancer, tumor, ‘cp’ for chest pain, or drug names.
● What is medical codes?
● Standard codes for representing diagnosis, procedures or drugs.
● Different medical organizations provide standard code format.
■ Ex: 1749 is a ICD9 code for breast cancer.
■ Ex: 96409 is a CPT code for chemotherapy.
3
Introduction
● How looks like Electronic Health Records (EHRs)?
● This dataset has both structured (i.e. lab values, medical codes) and
unstructured data (physician’s note data).
4
Unstructured data
(clinical note)
Structured data
(medical code events)
Code Code description
1749 breast cancer
96409 chemotherapy
… … …
Introduction
● What is NLP?
● Natural Language Processing, or NLP, is defined as the automatic
manipulation of natural language, like speech and text.
● Finding machine readable representation for words, and documents.
■ Ex: Understanding ‘severity’ of patients from physician’s note.
■ Understanding semantic similarity between ‘kidney’ and ‘renal’.
■ Finding patient phenotype or clusters from clinical note description.
5
Related works
● Previous research used EHRs for patient phenotyping [1], health risk
prediction [2, 3], cohort selection [4], and visual explorations [5, 6].
● Understanding text written in the medical notes is a very important step
for such research studies.
● Many frequency based methods, such as BOW, TF-IDF [9], PMI [10],
GloVe [7] have been developed to present documents/sentences.
● Recent studies focused on neural network based methods.
6
Skip-gram Model
● Skip-gram model scans each sentence to find the log-likelihood of scanned
(target) words within their context window.
● The likelihood of observing the context word wi for the target word wt is:
Wt + 2- 2
7
How would we learn this probability?
Project 1: Improving medical concept
representations with external knowledge
8
Problem: Learning medical concept represenations
● We can run skip-gram on medical notes to find concept representations.
● However, many concepts are rarely used in notes, but are important and
have significant meaning.
● External knowledge can help to improve the medical concept representations.
9How can we integrate this knowledge? Modified skip-gram model
Results: Qualitative analysis
● For a given medical concept, we check the 10 nearest neighbors based on the cosine similarity in
the learned vector space.
Top 10 nearest neighbor concepts of “bipolar disorder”
10
Our model
Bipolar disorder
depression
anxiety
Project: Jointly learning medical code
and concept representation
11
T. Bai, A. K. Chanda, B. L. Egleston, S. Vucetic, Joint learning of representations of medical
concepts and words from EHR data, in: IEEE International Conference on Bioinformatics and
Biomedicine, BIBM, 2017.
Problem
● EHRs contain structured data such as diagnostic codes and laboratory tests,
they also contain unstructured clinical notes.
● Joint Skip-gram model jointly learns medical code and word representations.
● Four types of pairs (context, target) are considered for learning representations
following skip-gram model (code to word, code to code, word to word, word
to code).
12
13
Conclusions and future plans:
● Improving medical concept representations using context-free model.
● Clinical BERT is a recent context-free model.
● Skip-gram model could be applied on other fields.
■ Web mining: assuming user’s web click as a bag of words.
■ Business transaction: user’s item purchase history is also a kind of sequence.
■ Human activity: human activity also provides a log sequence.
14
References:
1. Halpern, Y., Horng, S., Choi, Y., Sontag, D.: Electronic medical record phenotyping using the anchor and learn framework. Journal of the American Medical Informatics Association 23(4), 731{740 (2016)
2. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686 (2016)
3. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association 24(2), 361{370 (2016)
4. A. B. Nattinger, P. W. Laud, R. Bajorunaite, R. A. Sparapani, and J. L. Freeman, “An algorithm for the use of medicare claims data to identify women with incident breast cancer.” Health services research, 39(6p1):17331750,
2004.
5. D. Gotz, F. Wang, and A. Perer. A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. Journal of biomedical informatics, 48:148–159, 2014
6. J. Krause, N. Razavian, E. Bertini, and D. Sontag. Visual exploration of temporal data in electronic medical records. In AMIA, 2015
7. Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543
8. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, CoRR abs/1301.3781. arXiv:1301.3781.
9. Ramos, J., 2003, December. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133-142).
10. P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res., 37:141–188, 2010. doi: 10. 1613/jair.2934
11. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017 Dec;5:135-46.
12. X. Rong, word2vec parameter learning explained, CoRR abs/1411.2738. arXiv:1411.2738. URL http://arxiv.org/abs/1411.2738
13. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. "EHR phenotyping via jointly embedding medical concepts and words into a unified vector space". Journal of BMC medical info., Publisher: BioMed Central, vol. 18, 2018.
14. S. Vucetic, A. K. Chanda, S. Zhang, T. Bai, A. Maiti "Peer assessment of CS doctoral programs shows strong correlation with faculty citations". Journal of Communications of the ACM, vol. 61, p. 70-76, 2018.
15. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. " Joint learning of representations of medical concepts and words from EHR data". IEEE International Conference on Bioinformatics and Biomedicine (BIBM), p 764-769,
2017.
16. Aronson, A. R., and Lang, F.-M. 2010. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3):229–236.
17. Bodenreider, O. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32(suppl 1):D267–D270
18. Johnson, A. E.; Pollard, T. J.; Shen, L.; Li-wei, H. L.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, L. A.; and Mark, R. G. 2016. Mimic-iii, a freely accessible critical care database. Scientific data 3:160035.
19. Mullenbach, J.; Wiegreffe, S.; Duke, J.; Sun, J.; and Eisenstein, J. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, NAACLHLT 2018
20. Pei, Jian, et al. "Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth." Proceedings 17th international conference on data engineering. IEEE, 2001.
21. D. J. Gligorijevic, J. Stojanovic, and Z. Obradovic, “Modeling healthcare quality via compact representations of electronic health records.” Transactions on Computational Biology and Bioinformatics, 2016
15
Thank you all!
Contact me: ashis@temple.edu
16

Más contenido relacionado

Similar a Understanding medical concepts and codes through NLP methods

Waterloo September 00 Presentations
Waterloo September 00 PresentationsWaterloo September 00 Presentations
Waterloo September 00 Presentationsbrighteyes
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicineXavier Amatriain
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Thien Q. Tran
 
CV_Min_Jiang
CV_Min_JiangCV_Min_Jiang
CV_Min_JiangMIN JIANG
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Maria Karampela
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataSofia Ouhbi
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
NLM Georgia Biomedical Informatics
NLM Georgia Biomedical InformaticsNLM Georgia Biomedical Informatics
NLM Georgia Biomedical InformaticsAlison Aldrich
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...Kishor Datta Gupta
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-finalPeter Embi
 
NLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsNLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsCORIA-TALN 2018
 
Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...Tuula Myllylä-Nygård
 
13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docx13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docxhyacinthshackley2629
 
Future of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive TrendsFuture of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive TrendsSean Koon, MD, MS
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Health Informatics New Zealand
 
The Many Lives of Data
The Many Lives of DataThe Many Lives of Data
The Many Lives of Dataljmcneill33
 

Similar a Understanding medical concepts and codes through NLP methods (20)

Waterloo September 00 Presentations
Waterloo September 00 PresentationsWaterloo September 00 Presentations
Waterloo September 00 Presentations
 
Learning to speak medicine
Learning to speak medicineLearning to speak medicine
Learning to speak medicine
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
Deep learning in healthcare: Oppotunities and challenges with Electronic Medi...
 
CV_Min_Jiang
CV_Min_JiangCV_Min_Jiang
CV_Min_Jiang
 
Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.Accessing and Sharing Electronic Personal Health Data.
Accessing and Sharing Electronic Personal Health Data.
 
Accessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health DataAccessing and Sharing Electronic Personal Health Data
Accessing and Sharing Electronic Personal Health Data
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
NLM Georgia Biomedical Informatics
NLM Georgia Biomedical InformaticsNLM Georgia Biomedical Informatics
NLM Georgia Biomedical Informatics
 
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui..."Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
Quality of Life Technologies: From Cure to Care
Quality of Life Technologies: From Cure to CareQuality of Life Technologies: From Cure to Care
Quality of Life Technologies: From Cure to Care
 
NLP support for clinical tasks and decisions
NLP support for clinical tasks and decisionsNLP support for clinical tasks and decisions
NLP support for clinical tasks and decisions
 
Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...Concepts Related to Health Literacy in Online Information Environments: A Sys...
Concepts Related to Health Literacy in Online Information Environments: A Sys...
 
13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docx13-Jan-121AHCP 5330Introduction to Informatics.docx
13-Jan-121AHCP 5330Introduction to Informatics.docx
 
Future of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive TrendsFuture of Healthcare: 3 Disruptive Trends
Future of Healthcare: 3 Disruptive Trends
 
1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf1-s2.0-S0167923620300944-main.pdf
1-s2.0-S0167923620300944-main.pdf
 
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
Aleksandar Zivaljevic - Annotation of clinical datasets using openEHR Archety...
 
The Many Lives of Data
The Many Lives of DataThe Many Lives of Data
The Many Lives of Data
 
6431 WK10Assn1Pt2DonovanC
6431 WK10Assn1Pt2DonovanC6431 WK10Assn1Pt2DonovanC
6431 WK10Assn1Pt2DonovanC
 

Más de Ashis Chanda

Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHRAshis Chanda
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesAshis Chanda
 
Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Ashis Chanda
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksFull resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksAshis Chanda
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening searchAshis Chanda
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern miningAshis Chanda
 
An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...Ashis Chanda
 
Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Ashis Chanda
 

Más de Ashis Chanda (11)

Word2vector
Word2vectorWord2vector
Word2vector
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHR
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksFull resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networks
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening search
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
 
An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...
 
Data Mining
Data MiningData Mining
Data Mining
 
Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 

Último

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 

Último (20)

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

Understanding medical concepts and codes through NLP methods

  • 1. Understanding medical concepts and codes through NLP methods Presenter: Ashis Kumar Chanda Ph.D. student, CIS Dept.
  • 2. Contents ● Introduction ● Related works ● Project 1: ■ Improving Medical concept representations with external knowledge ● Project 2: ■ Jointly learning medical concepts and code representation ● Conclusions and future work 2
  • 3. Introduction ● What is medical concepts? ● Medical concepts are medical terms, abbreviations or short form words. ■ Ex: heart attack, breast cancer, tumor, ‘cp’ for chest pain, or drug names. ● What is medical codes? ● Standard codes for representing diagnosis, procedures or drugs. ● Different medical organizations provide standard code format. ■ Ex: 1749 is a ICD9 code for breast cancer. ■ Ex: 96409 is a CPT code for chemotherapy. 3
  • 4. Introduction ● How looks like Electronic Health Records (EHRs)? ● This dataset has both structured (i.e. lab values, medical codes) and unstructured data (physician’s note data). 4 Unstructured data (clinical note) Structured data (medical code events) Code Code description 1749 breast cancer 96409 chemotherapy … … …
  • 5. Introduction ● What is NLP? ● Natural Language Processing, or NLP, is defined as the automatic manipulation of natural language, like speech and text. ● Finding machine readable representation for words, and documents. ■ Ex: Understanding ‘severity’ of patients from physician’s note. ■ Understanding semantic similarity between ‘kidney’ and ‘renal’. ■ Finding patient phenotype or clusters from clinical note description. 5
  • 6. Related works ● Previous research used EHRs for patient phenotyping [1], health risk prediction [2, 3], cohort selection [4], and visual explorations [5, 6]. ● Understanding text written in the medical notes is a very important step for such research studies. ● Many frequency based methods, such as BOW, TF-IDF [9], PMI [10], GloVe [7] have been developed to present documents/sentences. ● Recent studies focused on neural network based methods. 6
  • 7. Skip-gram Model ● Skip-gram model scans each sentence to find the log-likelihood of scanned (target) words within their context window. ● The likelihood of observing the context word wi for the target word wt is: Wt + 2- 2 7 How would we learn this probability?
  • 8. Project 1: Improving medical concept representations with external knowledge 8
  • 9. Problem: Learning medical concept represenations ● We can run skip-gram on medical notes to find concept representations. ● However, many concepts are rarely used in notes, but are important and have significant meaning. ● External knowledge can help to improve the medical concept representations. 9How can we integrate this knowledge? Modified skip-gram model
  • 10. Results: Qualitative analysis ● For a given medical concept, we check the 10 nearest neighbors based on the cosine similarity in the learned vector space. Top 10 nearest neighbor concepts of “bipolar disorder” 10 Our model Bipolar disorder depression anxiety
  • 11. Project: Jointly learning medical code and concept representation 11 T. Bai, A. K. Chanda, B. L. Egleston, S. Vucetic, Joint learning of representations of medical concepts and words from EHR data, in: IEEE International Conference on Bioinformatics and Biomedicine, BIBM, 2017.
  • 12. Problem ● EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes. ● Joint Skip-gram model jointly learns medical code and word representations. ● Four types of pairs (context, target) are considered for learning representations following skip-gram model (code to word, code to code, word to word, word to code). 12
  • 13. 13
  • 14. Conclusions and future plans: ● Improving medical concept representations using context-free model. ● Clinical BERT is a recent context-free model. ● Skip-gram model could be applied on other fields. ■ Web mining: assuming user’s web click as a bag of words. ■ Business transaction: user’s item purchase history is also a kind of sequence. ■ Human activity: human activity also provides a log sequence. 14
  • 15. References: 1. Halpern, Y., Horng, S., Choi, Y., Sontag, D.: Electronic medical record phenotyping using the anchor and learn framework. Journal of the American Medical Informatics Association 23(4), 731{740 (2016) 2. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Medical concept representation learning from electronic health records and its application on heart failure prediction. arXiv preprint arXiv:1602.03686 (2016) 3. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association 24(2), 361{370 (2016) 4. A. B. Nattinger, P. W. Laud, R. Bajorunaite, R. A. Sparapani, and J. L. Freeman, “An algorithm for the use of medicare claims data to identify women with incident breast cancer.” Health services research, 39(6p1):17331750, 2004. 5. D. Gotz, F. Wang, and A. Perer. A methodology for interactive mining and visual analysis of clinical event patterns using electronic health record data. Journal of biomedical informatics, 48:148–159, 2014 6. J. Krause, N. Razavian, E. Bertini, and D. Sontag. Visual exploration of temporal data in electronic medical records. In AMIA, 2015 7. Pennington, J.; Socher, R.; and Manning, C. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543 8. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, CoRR abs/1301.3781. arXiv:1301.3781. 9. Ramos, J., 2003, December. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning (Vol. 242, pp. 133-142). 10. P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res., 37:141–188, 2010. doi: 10. 1613/jair.2934 11. Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics. 2017 Dec;5:135-46. 12. X. Rong, word2vec parameter learning explained, CoRR abs/1411.2738. arXiv:1411.2738. URL http://arxiv.org/abs/1411.2738 13. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. "EHR phenotyping via jointly embedding medical concepts and words into a unified vector space". Journal of BMC medical info., Publisher: BioMed Central, vol. 18, 2018. 14. S. Vucetic, A. K. Chanda, S. Zhang, T. Bai, A. Maiti "Peer assessment of CS doctoral programs shows strong correlation with faculty citations". Journal of Communications of the ACM, vol. 61, p. 70-76, 2018. 15. T. Bai, A. K. Chanda, S. Vucetic, B. L. Egleston. " Joint learning of representations of medical concepts and words from EHR data". IEEE International Conference on Bioinformatics and Biomedicine (BIBM), p 764-769, 2017. 16. Aronson, A. R., and Lang, F.-M. 2010. An overview of metamap: historical perspective and recent advances. Journal of the American Medical Informatics Association 17(3):229–236. 17. Bodenreider, O. 2004. The unified medical language system (umls): integrating biomedical terminology. Nucleic acids research 32(suppl 1):D267–D270 18. Johnson, A. E.; Pollard, T. J.; Shen, L.; Li-wei, H. L.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Celi, L. A.; and Mark, R. G. 2016. Mimic-iii, a freely accessible critical care database. Scientific data 3:160035. 19. Mullenbach, J.; Wiegreffe, S.; Duke, J.; Sun, J.; and Eisenstein, J. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACLHLT 2018 20. Pei, Jian, et al. "Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth." Proceedings 17th international conference on data engineering. IEEE, 2001. 21. D. J. Gligorijevic, J. Stojanovic, and Z. Obradovic, “Modeling healthcare quality via compact representations of electronic health records.” Transactions on Computational Biology and Bioinformatics, 2016 15
  • 16. Thank you all! Contact me: ashis@temple.edu 16