Slides from talk:
NLP tales in Biomedicine
Auckland MeetUp group, June 2014
http://www.meetup.com/Natural-Language-Processing-in-NZ/events/184030662/
Mining text to answer biomedical questions is a fascinating applied research area. The biomedical domain is one of the first 'big data' domains. It attracts people from the domain itself passionate to answer pressing scientific questions as well as computer scientists and linguists who see a domain with great standards, resources and numerous applications.
During this talk I will give you a brief overview of different NLP problems in the biomedical domain and I'll make comparisons to mainstream NLP applications (e.g., search) and other, more commercial domains (e.g., voice of customer). My aim is to introduce you to a domain with state of the art solutions, free high-quality resources and well developed methodologies. If I inspire anyone to work on challenging biomedical problems, will be a bonus!
3. Information of interest
Genes / Proteins specific information for database annotation
Gene names:
tinman, lilliputian, dreadlocks, lush, cheap date, methuselah, Van Gogh, maggie, brainiac, grim,
reaper, cleopatra, swiss cheese, ken and barbie, kenny, out cold, lava lamp, hamlet, sonic
hedgehog, werewolf, half pint, fucK, drop dead, chardonnay, agnostic, I’m not dead yet…
7. Proteins: their sub-cellular location, their structure, the conditions of
their expression, their interactions, disease associations…
Disease – Drug: interactions, adverse effects, secondary indications…
Other entities: organs/tissues, metabolites/chemicals, phenotypes…
Detecting: methodologies & findings in experimental papers, paradigm
shifts…
Systems for specific: diseases, pathways, drug targets, organisms…
Examples of information of interest
8. Don Swanson’s ABC model:
dietary fish oil
reduction of: blood viscosity, platelet
aggregability, vascular reactivity
Raynaud’s disease
- Swanson, D. R. (1986). Fish oil, Raynaud's syndrome and undiscovered
public knowledge. Perspectives in Biology and Medicine, 30(1): 7-18.
- Swanson, D. R. (1987). Two medical literatures that are logically but
not bibliographically connected. Journal of the American Society for
Information Science 38: 228-233.
Literature-based Discovery:
Text mining!
causes
ameliorates
9. Don Swanson’s ABC model:
dietary fish oil
reduction of: blood viscosity, platelet
aggregability, vascular reactivity
Raynaud’s disease
- Swanson, D. R. (1986). Fish oil, Raynaud's syndrome and undiscovered
public knowledge. Perspectives in Biology and Medicine, 30(1): 7-18.
- Swanson, D. R. (1987). Two medical literatures that are logically but
not bibliographically connected. Journal of the American Society for
Information Science 38: 228-233.
Literature-based Discovery:
Text mining!
causes
ameliorates
12. Search
Search query: SAF LTR
Looking for: interactions between SAF and viral LTR elements
(SAF is a transcription factor, LTR stands for ‘long terminal repeat’)
but also:
SAF: Single And Free
LTR: Long Term Relationship
better to use domain specific resources in occassions like this
33. Summary
Entities – Relationships/Interactions
Resources: Databases, Ontologies, Corpora…
Networks: Systems Biology, Translational Medicine, Literature-based Discovery
End Users – Search
Social Biomedicine
Citation analysis
… and this is just a 30 min introduction…