SlideShare una empresa de Scribd logo
1 de 17
Analysing Entity Type Variation
      across Biomedical Subdomains
Claudiu Mihăilă, Riza Theresa Batista-Navarro, Sophia Ananiadou



                                      Claudiu Mihăilă
                                   National Centre for Text Mining
                                    School of Computer Science
                                     University of Manchester

                                          26 May 2012
BioTxtM 2012




  Introduction
  • Named entities
        o Atomic elements, classified into various categories (protein,
          gene, disease, treatment, metabolite etc.)
                                                                            Theme
                                            Organism             Theme                            Organism
                                  Pro Pro                  Pro   Transcription      +Reg         Pro
In contrast to the phenotype of the pta ackA double mutant, pbgP transcription was reduced in the pmrD mutant.




   2
BioTxtM 2012




Introduction
• Corpora




3
BioTxtM 2012




Methodology
• Full-text open-access journal articles from UKPMC
• 20 subdomains 400 single broad-subject-termed articles


        Allergy &                                Communicable
                      Biology     Cell Biology                  Critical Care
      Immunology                                   Diseases


                                    Health
     Environmental                                  Medical
                     Genetics      Services                      Medicine
         Health                                   Informatics
                                   Research


     Microbiology    Neoplasms    Neurology      Pharmacology   Physiology



                     Pulmonary                     Tropical
     Public Health               Rheumatology                     Virology
                      Medicine                     Medicine




4
BioTxtM 2012




Methodology
• NE source: ASilver = AUKPMC                   AOscar       ANeMine

     Corpus                                                                   Annotation


             Allergy &                     UKPMC         Communicable
 Critical Care                Biology     Cell Biology                  Critical Care
           Immunology                                      Diseases


                                            Health
           Environmental                                    Medical
     Medicine                Genetics      Services                      Medicine
               Health                                     Informatics
                                           Research
                                           OSCAR

     Physiology
             Microbiology    Neoplasms    Neurology      Pharmacology   Physiology



                             Pulmonary                     Tropical
      Virology
             Public Health               Rheumatology
                                           NeMine                         Virology
                              Medicine                     Medicine




 5
BioTxtM 2012




Methodology
          NeMine                UKPMC
Gene                  Gene
Protein               Protein
Disease               Disease
Drug                  Drug
Metabolite            Metabolite
Bacteria              Gene|Protein
Diagnostic process
General phenomenon
                                             Silver
Indicator
                                           Annotation
Natural phenomenon              OSCAR
Organ                 Chemical molecule
Pathologic function   Chemical adjective
Symptom               Enzyme
Therapeutic process   Reaction
 6
BioTxtM 2012




Methodology
• Feature vectors


       Document d                   Document d
Enzyme               2    Enzyme                  0.45%
Chemical molecule   71    Chemical molecule      14.85%
Disease              8    Disease                 1.67%
Drug                12    Drug                    2.51%
Gene                15    Gene                    3.13%
Gene|Protein        155   Gene|Protein            3.24%
Metabolite           3    Metabolite              0.62%
Protein             188   Protein                39.33%
Reaction            24    Reaction                5.02%


 7
BioTxtM 2012




Methodology




8
BioTxtM 2012




Methodology




9
BioTxtM 2012




Methodology
• Chi-squared statistics




10
BioTxtM 2012




Methodology
• Frobenius norm




                   1247.0725




11
BioTxtM 2012




Feature evaluation
• Good features for
     o   Cell Biology
     o   Pharmacology
     o   Health Sciences
     o   Public Health

• Not-so-good features for
     o   Medical Informatics
     o   Medicine
     o   Microbiology
     o   Neoplasms
     o   Neurology
                   Frobenius norm of   2   vectors for each pair.
12
BioTxtM 2012




Feature evaluation
• Mean Chi-Squared for every feature over all pairs




13
BioTxtM 2012




Classifier selection
                       Classifier       Top result count
                       J48                 0       0%
                       JRip                4     2.10%
                       Logistic            2     1.05%
                       Random Tree         0       0%
                       Random Forest      86     45.26%
                       SMO                 0       0%
                       J48                 6     3.15%
                       JRip                7     3.68%
                       Decision Stump     16     8.42%
            AdaBoost




                       Logistic            0       0%
                       Random Tree         0       0%
                       Random Forest      68     35.78%
               Random Forest F-score for each5.26%
                  SMO                 1      pair.
14
BioTxtM 2012




Classifier evaluation
• Dissimilar subdomains
     o   Cell Biology
     o   Pharmacology
     o   Health Sciences
     o   Public Health

• Similar subdomains
     o   Medical Informatics
     o   Medicine
     o   Microbiology
     o   Neoplasms
     o   Neurology
                     Random Forest F-score for each pair.
15
BioTxtM 2012




Conclusions
• To remember
     o Significant semantic variation of biomedical sublanguages
     o Distinguishable bio-subdomains using only NE types
     o Caution needed when adapting NLP tools to subdomains
• To do
     o Extension to bio-events
     o Combination with lexical, syntactical, discourse features
     o Extension to other domains




16
BioTxtM 2012




Thank you!




        http://misteringo.deviantart.com/art/Bunnies-Scream-Again-79745974

17

Más contenido relacionado

La actualidad más candente

METOXIA Framework and Hypoxia and Acidosis in Human Physiology and Diseases
METOXIA Framework and Hypoxia and Acidosis in Human Physiology and DiseasesMETOXIA Framework and Hypoxia and Acidosis in Human Physiology and Diseases
METOXIA Framework and Hypoxia and Acidosis in Human Physiology and Diseases
MAASTRO clinic
 
Application of proteomics science
Application of proteomics scienceApplication of proteomics science
Application of proteomics science
Aanchal46
 
An immunohistochemical analysis of Canine Haemangioma and Haemangiosarcoma
An immunohistochemical analysis of Canine Haemangioma and HaemangiosarcomaAn immunohistochemical analysis of Canine Haemangioma and Haemangiosarcoma
An immunohistochemical analysis of Canine Haemangioma and Haemangiosarcoma
Rodrigo Shamed Cedillo Flores
 
Toxicogenomic technologies final
Toxicogenomic technologies finalToxicogenomic technologies final
Toxicogenomic technologies final
Dhananjaya Naik
 
PCMT Product Overview April 2013
PCMT Product Overview April 2013PCMT Product Overview April 2013
PCMT Product Overview April 2013
Chris Merritt
 
Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...
Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...
Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...
M. Luisetto Pharm.D.Spec. Pharmacology
 
A preliminary study on antibacterial efficacy of the methanolic
A preliminary study on antibacterial efficacy of the methanolicA preliminary study on antibacterial efficacy of the methanolic
A preliminary study on antibacterial efficacy of the methanolic
Alexander Decker
 

La actualidad más candente (19)

Malaria treatment schedules and socio economic implications of
Malaria treatment schedules and socio  economic implications ofMalaria treatment schedules and socio  economic implications of
Malaria treatment schedules and socio economic implications of
 
METOXIA Framework and Hypoxia and Acidosis in Human Physiology and Diseases
METOXIA Framework and Hypoxia and Acidosis in Human Physiology and DiseasesMETOXIA Framework and Hypoxia and Acidosis in Human Physiology and Diseases
METOXIA Framework and Hypoxia and Acidosis in Human Physiology and Diseases
 
Publications
PublicationsPublications
Publications
 
Application of proteomics science
Application of proteomics scienceApplication of proteomics science
Application of proteomics science
 
OECD Guidlines By Genotoxicity
OECD Guidlines By GenotoxicityOECD Guidlines By Genotoxicity
OECD Guidlines By Genotoxicity
 
An immunohistochemical analysis of Canine Haemangioma and Haemangiosarcoma
An immunohistochemical analysis of Canine Haemangioma and HaemangiosarcomaAn immunohistochemical analysis of Canine Haemangioma and Haemangiosarcoma
An immunohistochemical analysis of Canine Haemangioma and Haemangiosarcoma
 
Au31314319
Au31314319Au31314319
Au31314319
 
Toxicogenomic technologies final
Toxicogenomic technologies finalToxicogenomic technologies final
Toxicogenomic technologies final
 
Clinical Laboratory
Clinical LaboratoryClinical Laboratory
Clinical Laboratory
 
PCMT Product Overview April 2013
PCMT Product Overview April 2013PCMT Product Overview April 2013
PCMT Product Overview April 2013
 
Challenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profilingChallenges and opportunities in personal omics profiling
Challenges and opportunities in personal omics profiling
 
Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014
 
John O'Sullivan
John O'SullivanJohn O'Sullivan
John O'Sullivan
 
Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...
Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...
Endogenous toxicology luisetto m almukthar n behzad nili a gamal abdul hamid ...
 
A preliminary study on antibacterial efficacy of the methanolic
A preliminary study on antibacterial efficacy of the methanolicA preliminary study on antibacterial efficacy of the methanolic
A preliminary study on antibacterial efficacy of the methanolic
 
West African Sorghum Extract Again Shows Immune Health Benefits : Health-forever
West African Sorghum Extract Again Shows Immune Health Benefits : Health-foreverWest African Sorghum Extract Again Shows Immune Health Benefits : Health-forever
West African Sorghum Extract Again Shows Immune Health Benefits : Health-forever
 
Traditional Herbal Medicine To Increased Hemoglobin : HEALTH-FOREVER.COM
Traditional Herbal Medicine To Increased Hemoglobin : HEALTH-FOREVER.COMTraditional Herbal Medicine To Increased Hemoglobin : HEALTH-FOREVER.COM
Traditional Herbal Medicine To Increased Hemoglobin : HEALTH-FOREVER.COM
 
Ppt mutagenicity and carcinogenicity
Ppt mutagenicity and carcinogenicityPpt mutagenicity and carcinogenicity
Ppt mutagenicity and carcinogenicity
 
Proteomics
ProteomicsProteomics
Proteomics
 

Destacado (6)

Modelling social Web applications via tinydb
Modelling social Web applications via tinydbModelling social Web applications via tinydb
Modelling social Web applications via tinydb
 
Zemanta: A Content Recommendation Engine
Zemanta: A Content Recommendation EngineZemanta: A Content Recommendation Engine
Zemanta: A Content Recommendation Engine
 
Functional Dependency Grammar
Functional Dependency GrammarFunctional Dependency Grammar
Functional Dependency Grammar
 
Nature-inspired methods for the Semantic Web
Nature-inspired methods for the Semantic WebNature-inspired methods for the Semantic Web
Nature-inspired methods for the Semantic Web
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition Yarn
 
To Be or Not to be a Zero Pronoun: A Machine Learning Approach for Romanian
To Be or Not to be a Zero Pronoun: A Machine Learning Approach for RomanianTo Be or Not to be a Zero Pronoun: A Machine Learning Approach for Romanian
To Be or Not to be a Zero Pronoun: A Machine Learning Approach for Romanian
 

Similar a Analysing Entity Type Variation across Biomedical Subdomains

T Sornasse Elan Chi Accelerating Proof Of Concept 2010
T Sornasse Elan Chi Accelerating Proof Of Concept 2010T Sornasse Elan Chi Accelerating Proof Of Concept 2010
T Sornasse Elan Chi Accelerating Proof Of Concept 2010
tsornasse
 
Session 3 part 1
Session 3 part 1Session 3 part 1
Session 3 part 1
plmiami
 
Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction
 Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction
Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction
mitoaction
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden
Alain van Gool
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
GenomeInABottle
 

Similar a Analysing Entity Type Variation across Biomedical Subdomains (20)

Michael Buschmann_Nanomedecine
Michael Buschmann_NanomedecineMichael Buschmann_Nanomedecine
Michael Buschmann_Nanomedecine
 
T Sornasse Elan Chi Accelerating Proof Of Concept 2010
T Sornasse Elan Chi Accelerating Proof Of Concept 2010T Sornasse Elan Chi Accelerating Proof Of Concept 2010
T Sornasse Elan Chi Accelerating Proof Of Concept 2010
 
Bioinformatics Course
Bioinformatics CourseBioinformatics Course
Bioinformatics Course
 
Drug discovery
Drug discoveryDrug discovery
Drug discovery
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
 
Drug discovery and development. Introducing
Drug discovery and development. IntroducingDrug discovery and development. Introducing
Drug discovery and development. Introducing
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
 
Drugdiscoveryanddevelopment by khadga raj
Drugdiscoveryanddevelopment by khadga rajDrugdiscoveryanddevelopment by khadga raj
Drugdiscoveryanddevelopment by khadga raj
 
Session 3 part 1
Session 3 part 1Session 3 part 1
Session 3 part 1
 
2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem2014 12-11 Skipr99 masterclass Arnhem
2014 12-11 Skipr99 masterclass Arnhem
 
Building a Program in Personalized Medicine
Building a Program in Personalized Medicine Building a Program in Personalized Medicine
Building a Program in Personalized Medicine
 
Positions in the Clinical Laboratory
Positions in the Clinical LaboratoryPositions in the Clinical Laboratory
Positions in the Clinical Laboratory
 
Campo, Luis - Technologies in Personalized Medicine
Campo, Luis - Technologies in Personalized MedicineCampo, Luis - Technologies in Personalized Medicine
Campo, Luis - Technologies in Personalized Medicine
 
Tech Forum FJMS
Tech Forum FJMSTech Forum FJMS
Tech Forum FJMS
 
Bioteknologi usd07
Bioteknologi usd07Bioteknologi usd07
Bioteknologi usd07
 
Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction
 Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction
Primary Mitochondrial Disease and Secondary Mitochondrial Dysfunction
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
 
2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden2013-11-26 DTL FIH symposium, Leiden
2013-11-26 DTL FIH symposium, Leiden
 
Molecular profiling 2012
Molecular profiling 2012Molecular profiling 2012
Molecular profiling 2012
 
George Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait DataGeorge Church: Standards & Open-Access Genome-Environment-Trait Data
George Church: Standards & Open-Access Genome-Environment-Trait Data
 

Último

CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
ocean4396
 

Último (20)

Denture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of actionDenture base resins materials and its mechanism of action
Denture base resins materials and its mechanism of action
 
Dermatome and myotome test & pathology.pdf
Dermatome and myotome test & pathology.pdfDermatome and myotome test & pathology.pdf
Dermatome and myotome test & pathology.pdf
 
Tips to Choose the Best Psychiatrists in Indore
Tips to Choose the Best Psychiatrists in IndoreTips to Choose the Best Psychiatrists in Indore
Tips to Choose the Best Psychiatrists in Indore
 
VVIP Whitefield ℂall Girls 6350482085 Heat-flaring { Bangalore } Worthy Girl ...
VVIP Whitefield ℂall Girls 6350482085 Heat-flaring { Bangalore } Worthy Girl ...VVIP Whitefield ℂall Girls 6350482085 Heat-flaring { Bangalore } Worthy Girl ...
VVIP Whitefield ℂall Girls 6350482085 Heat-flaring { Bangalore } Worthy Girl ...
 
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
CAS 110-63-4 BDO Liquid 1,4-Butanediol 1 4 BDO Warehouse Supply For Excellent...
 
Cardiovascular Physiology - Regulation of Cardiac Pumping
Cardiovascular Physiology - Regulation of Cardiac PumpingCardiovascular Physiology - Regulation of Cardiac Pumping
Cardiovascular Physiology - Regulation of Cardiac Pumping
 
Making Patient-Centric Immunotherapy a Reality in Lung Cancer: Best Practices...
Making Patient-Centric Immunotherapy a Reality in Lung Cancer: Best Practices...Making Patient-Centric Immunotherapy a Reality in Lung Cancer: Best Practices...
Making Patient-Centric Immunotherapy a Reality in Lung Cancer: Best Practices...
 
Cervical screening – taking care of your health flipchart (Vietnamese)
Cervical screening – taking care of your health flipchart (Vietnamese)Cervical screening – taking care of your health flipchart (Vietnamese)
Cervical screening – taking care of your health flipchart (Vietnamese)
 
Gallbladder Double-Diverticular: A Case Report المرارة مزدوجة التج: تقرير حالة
Gallbladder Double-Diverticular: A Case Report  المرارة مزدوجة التج: تقرير حالةGallbladder Double-Diverticular: A Case Report  المرارة مزدوجة التج: تقرير حالة
Gallbladder Double-Diverticular: A Case Report المرارة مزدوجة التج: تقرير حالة
 
CONGENITAL HYPERTROPHIC PYLORIC STENOSIS by Dr M.KARTHIK EMMANUEL
CONGENITAL HYPERTROPHIC PYLORIC STENOSIS  by Dr M.KARTHIK EMMANUELCONGENITAL HYPERTROPHIC PYLORIC STENOSIS  by Dr M.KARTHIK EMMANUEL
CONGENITAL HYPERTROPHIC PYLORIC STENOSIS by Dr M.KARTHIK EMMANUEL
 
Vip ℂall Girls Shalimar Bagh Phone No 9999965857 High Profile ℂall Girl Delhi...
Vip ℂall Girls Shalimar Bagh Phone No 9999965857 High Profile ℂall Girl Delhi...Vip ℂall Girls Shalimar Bagh Phone No 9999965857 High Profile ℂall Girl Delhi...
Vip ℂall Girls Shalimar Bagh Phone No 9999965857 High Profile ℂall Girl Delhi...
 
Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...
Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...
Let's Talk About It: Ovarian Cancer (The Emotional Toll of Treatment Decision...
 
Premium ℂall Girls In Mira Road👉 Dail ℂALL ME: 📞9004268417 📲 ℂall Richa VIP ℂ...
Premium ℂall Girls In Mira Road👉 Dail ℂALL ME: 📞9004268417 📲 ℂall Richa VIP ℂ...Premium ℂall Girls In Mira Road👉 Dail ℂALL ME: 📞9004268417 📲 ℂall Richa VIP ℂ...
Premium ℂall Girls In Mira Road👉 Dail ℂALL ME: 📞9004268417 📲 ℂall Richa VIP ℂ...
 
TEST BANK For Huether and McCance's Understanding Pathophysiology, Canadian 2...
TEST BANK For Huether and McCance's Understanding Pathophysiology, Canadian 2...TEST BANK For Huether and McCance's Understanding Pathophysiology, Canadian 2...
TEST BANK For Huether and McCance's Understanding Pathophysiology, Canadian 2...
 
Tips and tricks to pass the cardiovascular station for PACES exam
Tips and tricks to pass the cardiovascular station for PACES examTips and tricks to pass the cardiovascular station for PACES exam
Tips and tricks to pass the cardiovascular station for PACES exam
 
Premium ℂall Girls In Mumbai Airport👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa ...
Premium ℂall Girls In Mumbai Airport👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa ...Premium ℂall Girls In Mumbai Airport👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa ...
Premium ℂall Girls In Mumbai Airport👉 Dail ℂALL ME: 📞9833325238 📲 ℂall Richa ...
 
Evidence-based practiceEBP) in physiotherapy
Evidence-based practiceEBP) in physiotherapyEvidence-based practiceEBP) in physiotherapy
Evidence-based practiceEBP) in physiotherapy
 
HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...
HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...
HIFI* ℂall Girls In Thane West Phone 🔝 9920874524 🔝 💃 Me All Time Serviℂe Ava...
 
ANAPHYLAXIS BY DR.SOHAN BISWAS,MBBS,DNB(INTERNAL MEDICINE) RESIDENT.pptx
ANAPHYLAXIS BY DR.SOHAN BISWAS,MBBS,DNB(INTERNAL MEDICINE) RESIDENT.pptxANAPHYLAXIS BY DR.SOHAN BISWAS,MBBS,DNB(INTERNAL MEDICINE) RESIDENT.pptx
ANAPHYLAXIS BY DR.SOHAN BISWAS,MBBS,DNB(INTERNAL MEDICINE) RESIDENT.pptx
 
Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?
Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?
Case presentation on Antibody screening- how to solve 3 cell and 11 cell panel?
 

Analysing Entity Type Variation across Biomedical Subdomains

  • 1. Analysing Entity Type Variation across Biomedical Subdomains Claudiu Mihăilă, Riza Theresa Batista-Navarro, Sophia Ananiadou Claudiu Mihăilă National Centre for Text Mining School of Computer Science University of Manchester 26 May 2012
  • 2. BioTxtM 2012 Introduction • Named entities o Atomic elements, classified into various categories (protein, gene, disease, treatment, metabolite etc.) Theme Organism Theme Organism Pro Pro Pro Transcription +Reg Pro In contrast to the phenotype of the pta ackA double mutant, pbgP transcription was reduced in the pmrD mutant. 2
  • 4. BioTxtM 2012 Methodology • Full-text open-access journal articles from UKPMC • 20 subdomains 400 single broad-subject-termed articles Allergy & Communicable Biology Cell Biology Critical Care Immunology Diseases Health Environmental Medical Genetics Services Medicine Health Informatics Research Microbiology Neoplasms Neurology Pharmacology Physiology Pulmonary Tropical Public Health Rheumatology Virology Medicine Medicine 4
  • 5. BioTxtM 2012 Methodology • NE source: ASilver = AUKPMC AOscar ANeMine Corpus Annotation Allergy & UKPMC Communicable Critical Care Biology Cell Biology Critical Care Immunology Diseases Health Environmental Medical Medicine Genetics Services Medicine Health Informatics Research OSCAR Physiology Microbiology Neoplasms Neurology Pharmacology Physiology Pulmonary Tropical Virology Public Health Rheumatology NeMine Virology Medicine Medicine 5
  • 6. BioTxtM 2012 Methodology NeMine UKPMC Gene Gene Protein Protein Disease Disease Drug Drug Metabolite Metabolite Bacteria Gene|Protein Diagnostic process General phenomenon Silver Indicator Annotation Natural phenomenon OSCAR Organ Chemical molecule Pathologic function Chemical adjective Symptom Enzyme Therapeutic process Reaction 6
  • 7. BioTxtM 2012 Methodology • Feature vectors Document d Document d Enzyme 2 Enzyme 0.45% Chemical molecule 71 Chemical molecule 14.85% Disease 8 Disease 1.67% Drug 12 Drug 2.51% Gene 15 Gene 3.13% Gene|Protein 155 Gene|Protein 3.24% Metabolite 3 Metabolite 0.62% Protein 188 Protein 39.33% Reaction 24 Reaction 5.02% 7
  • 12. BioTxtM 2012 Feature evaluation • Good features for o Cell Biology o Pharmacology o Health Sciences o Public Health • Not-so-good features for o Medical Informatics o Medicine o Microbiology o Neoplasms o Neurology Frobenius norm of 2 vectors for each pair. 12
  • 13. BioTxtM 2012 Feature evaluation • Mean Chi-Squared for every feature over all pairs 13
  • 14. BioTxtM 2012 Classifier selection Classifier Top result count J48 0 0% JRip 4 2.10% Logistic 2 1.05% Random Tree 0 0% Random Forest 86 45.26% SMO 0 0% J48 6 3.15% JRip 7 3.68% Decision Stump 16 8.42% AdaBoost Logistic 0 0% Random Tree 0 0% Random Forest 68 35.78% Random Forest F-score for each5.26% SMO 1 pair. 14
  • 15. BioTxtM 2012 Classifier evaluation • Dissimilar subdomains o Cell Biology o Pharmacology o Health Sciences o Public Health • Similar subdomains o Medical Informatics o Medicine o Microbiology o Neoplasms o Neurology Random Forest F-score for each pair. 15
  • 16. BioTxtM 2012 Conclusions • To remember o Significant semantic variation of biomedical sublanguages o Distinguishable bio-subdomains using only NE types o Caution needed when adapting NLP tools to subdomains • To do o Extension to bio-events o Combination with lexical, syntactical, discourse features o Extension to other domains 16
  • 17. BioTxtM 2012 Thank you! http://misteringo.deviantart.com/art/Bunnies-Scream-Again-79745974 17