SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Full-texts representation with MeSH, co-citations network reranking 
BiTeM/SIBtexgroup 
J Gobeill (me), A Gaudinat, E Pascheand P Ruch 
University of Applied Sciences, 
SwissInstitute of Bioinformatics, 
Hospitalsand University of Geneva
The BiTeM/ SIBtexgroup 
•TextMining and Bibliomics(P Ruch) 
Strongfocus on clinicaland biologicaldata 
heg(training librarians) and SIB (assistingbiocurators) 
•Long historyof participation in TREC campaigns 
Genomics, Chemical IR, MedicalRecords… 
•Translationalmedicineprojects(EU FP7 Programme) 
Khresmoi: multimodal medicalsearchengine 
MD-Paedigree: retrievalof similarcases for clinicians
The CDS Track2014 
•ClinicalDecisionSupport : « retrieval of biomedical articles relevant for answering generic clinical questions about medical records » 
Ex. query: « 25-year-old woman with fatigue,hairloss, 
weight gain, and cold intolerance for 6 
months» 
Collection: subsetof PubMedCentral
Strategiesfor TREC CDS 2014 
Reranking 
4. Boostingbasedon article types 
5. Exploitation of the co-citations network 
Document Representation 
1. Classicaldocument representationwithtext 
2. Document representationwithMeSH 
3. Target-specificsemanticenrichmentwithMeSHIR performedby (Okapi BM25)
BiTeMofficial results 
our 
baseline 
our 
baseline
Creatinga baseline 
1. Classicaldocument representationwithtext 
Textindex 
Searchengine
1. Classicaldocument representationwithtext 
•Twodifferentindexinglevels: 
•Document 
•Section 
Run2 vs run4 : document > section (+ 65%) 
•Queryrepresentation(R-Prec): 
•Numbersremoving(no age) 
•Onlydescription: 0.169 
•Onlysummaries: 0.170 
•Both: 0.185 (+10%) 
Signal/noise ratio: betterwithmore information 
Document 
Sections
Creatinga complementaryview 
2. Document representationwithMeSH 
MeSHindex 
Searchengine 
MeSHfor PMC 2649306 
D008569 Memory Disorders 
D001921 Brain 
D001284 Atrophy 
D001706 Biopsy 
D005911 Gliosis
2. Document representationwithMeSH 
•Twopossible sources: 
•CollectedfromMEDLINE whenthereisa PMID 
•Extractedfromdocuments witha categorizer(strict mapping) 
•Twopossible integrationsbetweenoriginal textand MeSH: 
•Building separateindexes thencombiningruns 
•Mergingbothrepresentationsintoone unique document
MeSHconcepts found: 
D008568 Memory 
D008569 Memory Disorders 
D007866 Leg 
D009068 Movement 
D001921 Brain 
D001284 Atrophy 
D001706 Biopsy 
D005911 Gliosis 
<topic number="8"> 
<summary>62-year-old man withprogressive memory lossand involuntarylegmovements. Brain MRI revealscortical atrophy, and cortical biopsyshows vacuolargray matterchanges withreactiveastrocytosis.</summary> 
Exampleof MeSHmapping 
D013035: 
MuscularSpasm? 
Somegood (power of synonyms) 
Somebroad 
Somemissing(tooambiguous) 
D002540: 
CerebralCortex ? 
D008279: MH = MagneticResonanceImaging ? 
MedicalResearchInstitute ? 
ModerateRenalInsufficiency?
MEDLINE MeSHin docs 
Humans 
Animals 
Female 
Male 
Adult 
Middle Aged 
Mice 
Aged 
Adolescent 
Molecular Sequence Data 
Rats 
Young Adult 
Time Factors 
Child 
Signal Transduction 
ExtractedMeSHin docs 
Cells 
Ficus (becauseof «fig») 
Patients 
Time 
Genes 
Therapeutics 
Methods 
Role 
Humans 
Disease 
Volition 
Mice 
Attention 
DNA 
Population 
ExtractedMeSHin topics 
Women 
History 
Pain 
Blood 
Physical Examination 
Female 
Blood Pressure 
Pressure 
Dyspnea 
Family 
Thorax 
Urine 
Fever 
Male 
Emergencies 
Top 15 MeSHin benchmark
Resultsfor MeSHrepresentation 
•Best R-Prec0.143 for MeSHrepresentation(vs 0.211 for text) 
oMeSHconcepts collectedfromMEDLINE not useful(best R-Prec0.028) 
oOnly 53%of documents had MeSHterms in MEDLINE 
•Complementarityfor findingrelevant documents (thanksto qrel) : 
•Lowcomplementarity 
•Combination: 0.211 -> 0.213
Favoringtargettypes 
MeSHfor PMC 2649306 
D008569 Memory Disorders 
D001921 Brain 
D001284 Atrophy 
D005911 Gliosis 
D001706 Biopsy 
MeSHtargetDiagnosis 
MeSHtargetDiagnosis 
MeSHtargetTest 
Do relevant documents for diagnosisdeal more withdiagnosis? 
3. Target-specific semantic enrichment with MeSH
3. Target-specificsemanticenrichmentwithMeSH 
•In UMLS, eachMeSHtermhas SemanticTypes(ex: T060 Diagnostic Procedure) 
Focus on targets(diagnosis, treatmentsand tests) 
•Specificwords(ex: «MeSHtargetDiag») are addedin docs and queries 
Target 
% docs thathave at least1 
Averagenumberin documents 
Test 
83 % 
16 
Diagnosis 
86 % 
41 
Treatment 
86 % 
24 
Small improvementonlyfor section indexing
In the qrel… 
Set 
Aver. DiagnosisMeSH 
Aver. Test MeSH 
Aver. TreatmentMeSH 
All collection 
41 
16 
24 
Relevant for diagnosis 
(1|2for queries1..10) 
108 
41 
41 
Relevant for test 
(1|2for queries11..20) 
107 
41 
33 
Relevant for treatment 
(1|2for queries21..30) 
114 
47 
52 
All relevant documents: 
oAre quitesimilar, withno distinction betweentargets 
oBut have 2/3 times more targetMeSHterms 
o... but it’salsothe case for documents with0 in the qrel
4. Boostingbasedon article types 
Promotingsomearticle types 
Are somearticle types more likelyto berelevant ?
Article type 
Distribution 
in docs 
in qrel 
in our runs 
research-article 
74.3 % 
52.2% 
37.9 % 
case-report 
4.0 % 
20.4 % 
41.5 % 
review-article 
6.9 % 
17.9 % 
10.9 % 
Other 
2.6 % 
3.2 % 
3.6 % 
brief-report 
1.1 % 
1.5 % 
0.9 % 
4. Boostingbasedon article types 
•Strategy: to promotereviewand case-basedarticles (boosting) 
•Intuition wasgood… 
•In reality… 
the IR enginealreadypromotedthesetypes ! 
but the strategyfailed! 
Top 5
5. Exploitation of the co-citations network 
Promotingcitations 
Are citations of retrieveddocuments relevant ?
5. Exploitation of the co-citations network 
•E isthe set of retrieveddocuments 
•RSVeisthe RetrievalStatusValue of doce 
•Weboosteachcitation of doceby + αx RSVe 
•50% of documents cite anotherone in the collection (avg3.8 cits)
Results 
•Withα= 0.1, slightimprovement 
•+ 10% for R-PREC 
•+ 20% for infNCDG 
•In TREC Chem2010 Prior Art task, + 150% for MAP
Conclusions 
“what is important is to have fought well”
Conclusions 
•A lot of strategies, but not muchbetterthanTerrier baseline 
•Section indexing: neveragain 
•MeSHnot complementary… Betterwheninferedby a k-NN ? 
•Relevant docs talk about test, diagand treatmentaltogether. 
•Maybewehave to startworkingfromthe baselinerun…

Más contenido relacionado

La actualidad más candente

Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...
Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...
Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...
jodischneider
 
Continued citation of bad science and what we can do about it--2021-04-20
Continued citation of bad science and what we can do about it--2021-04-20Continued citation of bad science and what we can do about it--2021-04-20
Continued citation of bad science and what we can do about it--2021-04-20
jodischneider
 
Validity of Instruments, Appropriateness of Designs and Statistics in Article...
Validity of Instruments, Appropriateness of Designs and Statistics in Article...Validity of Instruments, Appropriateness of Designs and Statistics in Article...
Validity of Instruments, Appropriateness of Designs and Statistics in Article...
iosrjce
 
20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiology20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiology
Internet Medical Journal
 

La actualidad más candente (20)

Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...
Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...
Beyond Randomized Clinical Trials: emerging innovations in reasoning about he...
 
How deep learning reshapes medicine
How deep learning reshapes medicineHow deep learning reshapes medicine
How deep learning reshapes medicine
 
Statistical methods for cardiovascular researchers
Statistical methods for cardiovascular researchersStatistical methods for cardiovascular researchers
Statistical methods for cardiovascular researchers
 
Continued citation of bad science and what we can do about it--2021-04-20
Continued citation of bad science and what we can do about it--2021-04-20Continued citation of bad science and what we can do about it--2021-04-20
Continued citation of bad science and what we can do about it--2021-04-20
 
Validity of Instruments, Appropriateness of Designs and Statistics in Article...
Validity of Instruments, Appropriateness of Designs and Statistics in Article...Validity of Instruments, Appropriateness of Designs and Statistics in Article...
Validity of Instruments, Appropriateness of Designs and Statistics in Article...
 
Amia tb-review-12
Amia tb-review-12Amia tb-review-12
Amia tb-review-12
 
Evaluating the Medical Literature
Evaluating the Medical LiteratureEvaluating the Medical Literature
Evaluating the Medical Literature
 
Key Issues on the Economics of Precision Medicine
Key Issues on the Economics of Precision MedicineKey Issues on the Economics of Precision Medicine
Key Issues on the Economics of Precision Medicine
 
Using Value-of-Information methodology to inform the design of clinical trial...
Using Value-of-Information methodology to inform the design of clinical trial...Using Value-of-Information methodology to inform the design of clinical trial...
Using Value-of-Information methodology to inform the design of clinical trial...
 
20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiology20050325 Design of clinical trails in radiology
20050325 Design of clinical trails in radiology
 
Diagrammatic Summary of Research Methodology, Ethics & Statistics
Diagrammatic Summary of Research Methodology, Ethics & StatisticsDiagrammatic Summary of Research Methodology, Ethics & Statistics
Diagrammatic Summary of Research Methodology, Ethics & Statistics
 
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEUSING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014
 
Spss course poster
Spss course posterSpss course poster
Spss course poster
 
A systematic review of interventions for children with cerebral palsy state ...
A systematic review of interventions for children with cerebral palsy  state ...A systematic review of interventions for children with cerebral palsy  state ...
A systematic review of interventions for children with cerebral palsy state ...
 
10.1.1.85.452
10.1.1.85.45210.1.1.85.452
10.1.1.85.452
 
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug DiscoveryBioVariance - Pediatric Pharmacogenomics in Drug Discovery
BioVariance - Pediatric Pharmacogenomics in Drug Discovery
 
Big Data and Practice-based Evidence: How EHR data is bringing the voice of n...
Big Data and Practice-based Evidence: How EHR data is bringing the voice of n...Big Data and Practice-based Evidence: How EHR data is bringing the voice of n...
Big Data and Practice-based Evidence: How EHR data is bringing the voice of n...
 
Neuronautics
NeuronauticsNeuronautics
Neuronautics
 

Destacado

KST/ICSHP - 3. a 4. přednáška
KST/ICSHP - 3. a 4. přednáškaKST/ICSHP - 3. a 4. přednáška
KST/ICSHP - 3. a 4. přednáška
Jan Hřídel
 
Lotus Notes 8 - Administrace
Lotus Notes 8 - AdministraceLotus Notes 8 - Administrace
Lotus Notes 8 - Administrace
TCL DigiTrade
 
Sai baba message to humanity
Sai baba message to humanitySai baba message to humanity
Sai baba message to humanity
rajuramakrishna
 
uvod do x86 strojoveho kodu a assembleru
uvod do x86 strojoveho kodu  a assembleruuvod do x86 strojoveho kodu  a assembleru
uvod do x86 strojoveho kodu a assembleru
idiftl
 
Iwp top30 screen
Iwp top30 screenIwp top30 screen
Iwp top30 screen
VBadrak
 
Newsletter_Autumn_2015 (1)
Newsletter_Autumn_2015 (1)Newsletter_Autumn_2015 (1)
Newsletter_Autumn_2015 (1)
Beverly Clarke
 

Destacado (20)

Xerox barevny tisk_ii
Xerox barevny tisk_iiXerox barevny tisk_ii
Xerox barevny tisk_ii
 
BACARDÍ Art Of Cocktail (prezentace)
BACARDÍ Art Of Cocktail (prezentace)BACARDÍ Art Of Cocktail (prezentace)
BACARDÍ Art Of Cocktail (prezentace)
 
Automatizované obchodování pro mediální domy
Automatizované obchodování pro mediální domyAutomatizované obchodování pro mediální domy
Automatizované obchodování pro mediální domy
 
2014 06 16 happyweek 72
2014 06 16  happyweek 722014 06 16  happyweek 72
2014 06 16 happyweek 72
 
Scifi a kritika kapitalismu (Pirx)
Scifi a kritika kapitalismu (Pirx)Scifi a kritika kapitalismu (Pirx)
Scifi a kritika kapitalismu (Pirx)
 
Text vyzvy c_51
Text vyzvy c_51Text vyzvy c_51
Text vyzvy c_51
 
Sila zjednodušovania
Sila zjednodušovaniaSila zjednodušovania
Sila zjednodušovania
 
KST/ICSHP - 3. a 4. přednáška
KST/ICSHP - 3. a 4. přednáškaKST/ICSHP - 3. a 4. přednáška
KST/ICSHP - 3. a 4. přednáška
 
Lotus Notes 8 - Administrace
Lotus Notes 8 - AdministraceLotus Notes 8 - Administrace
Lotus Notes 8 - Administrace
 
Sai baba message to humanity
Sai baba message to humanitySai baba message to humanity
Sai baba message to humanity
 
uvod do x86 strojoveho kodu a assembleru
uvod do x86 strojoveho kodu  a assembleruuvod do x86 strojoveho kodu  a assembleru
uvod do x86 strojoveho kodu a assembleru
 
Analýza diskuzních pořadů českých televizí - Q 4 / 2010
Analýza diskuzních pořadů českých televizí - Q 4 / 2010Analýza diskuzních pořadů českých televizí - Q 4 / 2010
Analýza diskuzních pořadů českých televizí - Q 4 / 2010
 
HIC 03: alchymie - chemie
HIC 03: alchymie - chemieHIC 03: alchymie - chemie
HIC 03: alchymie - chemie
 
Severní Amerika
Severní AmerikaSeverní Amerika
Severní Amerika
 
Porovnání Splunk / Arcsight Logger
Porovnání Splunk / Arcsight LoggerPorovnání Splunk / Arcsight Logger
Porovnání Splunk / Arcsight Logger
 
Iwp top30 screen
Iwp top30 screenIwp top30 screen
Iwp top30 screen
 
Bibliografické citace (FT)
Bibliografické citace (FT)Bibliografické citace (FT)
Bibliografické citace (FT)
 
Newsletter_Autumn_2015 (1)
Newsletter_Autumn_2015 (1)Newsletter_Autumn_2015 (1)
Newsletter_Autumn_2015 (1)
 
Predstaveni společnosti Societe
Predstaveni společnosti Societe Predstaveni společnosti Societe
Predstaveni společnosti Societe
 
Léčiva
LéčivaLéčiva
Léčiva
 

Similar a BiTeM / SIBTex @ TREC CDS 2014

Critiquing Evaluation Criteria for Quantitative Research Artic
Critiquing Evaluation Criteria for Quantitative Research ArticCritiquing Evaluation Criteria for Quantitative Research Artic
Critiquing Evaluation Criteria for Quantitative Research Artic
MargenePurnell14
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation
Jackie Wirz, PhD
 
Leroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at SkolkovoLeroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at Skolkovo
igorod
 
EBSCO Publishing Citation Format APA (American Psychologica.docx
EBSCO Publishing   Citation Format APA (American Psychologica.docxEBSCO Publishing   Citation Format APA (American Psychologica.docx
EBSCO Publishing Citation Format APA (American Psychologica.docx
tidwellveronique
 

Similar a BiTeM / SIBTex @ TREC CDS 2014 (20)

Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
Norwegian clinical genetics analysis platform ”genAP”, Thomas Grünfeld and To...
 
Critiquing Evaluation Criteria for Quantitative Research Artic
Critiquing Evaluation Criteria for Quantitative Research ArticCritiquing Evaluation Criteria for Quantitative Research Artic
Critiquing Evaluation Criteria for Quantitative Research Artic
 
Family Med Orientation July 2009
Family Med Orientation July 2009Family Med Orientation July 2009
Family Med Orientation July 2009
 
Advanced Regression Methods For Single-Case Designs Studying Propranolol In ...
Advanced Regression Methods For Single-Case Designs  Studying Propranolol In ...Advanced Regression Methods For Single-Case Designs  Studying Propranolol In ...
Advanced Regression Methods For Single-Case Designs Studying Propranolol In ...
 
Nuckolls U Iowa Aug 2016 (education).pptx
Nuckolls U Iowa Aug 2016 (education).pptxNuckolls U Iowa Aug 2016 (education).pptx
Nuckolls U Iowa Aug 2016 (education).pptx
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation SPARC 2013 Data Management Presentation
SPARC 2013 Data Management Presentation
 
Leroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at SkolkovoLeroy Hood biomedical challenges at Skolkovo
Leroy Hood biomedical challenges at Skolkovo
 
Why bother with evidence-based practice?
Why bother with evidence-based practice?Why bother with evidence-based practice?
Why bother with evidence-based practice?
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicine
 
EBSCO Publishing Citation Format APA (American Psychologica.docx
EBSCO Publishing   Citation Format APA (American Psychologica.docxEBSCO Publishing   Citation Format APA (American Psychologica.docx
EBSCO Publishing Citation Format APA (American Psychologica.docx
 
Enabling Evidence Based Medicine
Enabling Evidence Based MedicineEnabling Evidence Based Medicine
Enabling Evidence Based Medicine
 
Evidence-Based Practice: An introduction for new librarians
Evidence-Based Practice: An introduction for new librariansEvidence-Based Practice: An introduction for new librarians
Evidence-Based Practice: An introduction for new librarians
 
Mmc doctors as researchers
Mmc doctors as researchersMmc doctors as researchers
Mmc doctors as researchers
 
Watson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcareWatson – from Jeopardy to healthcare
Watson – from Jeopardy to healthcare
 
Sharing data from clinical and medical research
Sharing data from clinical and medical researchSharing data from clinical and medical research
Sharing data from clinical and medical research
 
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
Data-driven drug discovery for rare diseases - Tales from the trenches (CINF ...
 
Interdisciplinarity and complexity as opportunities for research innovation i...
Interdisciplinarity and complexity as opportunities for research innovation i...Interdisciplinarity and complexity as opportunities for research innovation i...
Interdisciplinarity and complexity as opportunities for research innovation i...
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest Group
 
Biobanking a user’s perspective: Dr. Jonathan Pevsner
Biobanking a user’s perspective: Dr. Jonathan PevsnerBiobanking a user’s perspective: Dr. Jonathan Pevsner
Biobanking a user’s perspective: Dr. Jonathan Pevsner
 

Último

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 

Último (20)

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 

BiTeM / SIBTex @ TREC CDS 2014

  • 1. Full-texts representation with MeSH, co-citations network reranking BiTeM/SIBtexgroup J Gobeill (me), A Gaudinat, E Pascheand P Ruch University of Applied Sciences, SwissInstitute of Bioinformatics, Hospitalsand University of Geneva
  • 2. The BiTeM/ SIBtexgroup •TextMining and Bibliomics(P Ruch) Strongfocus on clinicaland biologicaldata heg(training librarians) and SIB (assistingbiocurators) •Long historyof participation in TREC campaigns Genomics, Chemical IR, MedicalRecords… •Translationalmedicineprojects(EU FP7 Programme) Khresmoi: multimodal medicalsearchengine MD-Paedigree: retrievalof similarcases for clinicians
  • 3. The CDS Track2014 •ClinicalDecisionSupport : « retrieval of biomedical articles relevant for answering generic clinical questions about medical records » Ex. query: « 25-year-old woman with fatigue,hairloss, weight gain, and cold intolerance for 6 months» Collection: subsetof PubMedCentral
  • 4. Strategiesfor TREC CDS 2014 Reranking 4. Boostingbasedon article types 5. Exploitation of the co-citations network Document Representation 1. Classicaldocument representationwithtext 2. Document representationwithMeSH 3. Target-specificsemanticenrichmentwithMeSHIR performedby (Okapi BM25)
  • 5. BiTeMofficial results our baseline our baseline
  • 6. Creatinga baseline 1. Classicaldocument representationwithtext Textindex Searchengine
  • 7. 1. Classicaldocument representationwithtext •Twodifferentindexinglevels: •Document •Section Run2 vs run4 : document > section (+ 65%) •Queryrepresentation(R-Prec): •Numbersremoving(no age) •Onlydescription: 0.169 •Onlysummaries: 0.170 •Both: 0.185 (+10%) Signal/noise ratio: betterwithmore information Document Sections
  • 8. Creatinga complementaryview 2. Document representationwithMeSH MeSHindex Searchengine MeSHfor PMC 2649306 D008569 Memory Disorders D001921 Brain D001284 Atrophy D001706 Biopsy D005911 Gliosis
  • 9. 2. Document representationwithMeSH •Twopossible sources: •CollectedfromMEDLINE whenthereisa PMID •Extractedfromdocuments witha categorizer(strict mapping) •Twopossible integrationsbetweenoriginal textand MeSH: •Building separateindexes thencombiningruns •Mergingbothrepresentationsintoone unique document
  • 10. MeSHconcepts found: D008568 Memory D008569 Memory Disorders D007866 Leg D009068 Movement D001921 Brain D001284 Atrophy D001706 Biopsy D005911 Gliosis <topic number="8"> <summary>62-year-old man withprogressive memory lossand involuntarylegmovements. Brain MRI revealscortical atrophy, and cortical biopsyshows vacuolargray matterchanges withreactiveastrocytosis.</summary> Exampleof MeSHmapping D013035: MuscularSpasm? Somegood (power of synonyms) Somebroad Somemissing(tooambiguous) D002540: CerebralCortex ? D008279: MH = MagneticResonanceImaging ? MedicalResearchInstitute ? ModerateRenalInsufficiency?
  • 11. MEDLINE MeSHin docs Humans Animals Female Male Adult Middle Aged Mice Aged Adolescent Molecular Sequence Data Rats Young Adult Time Factors Child Signal Transduction ExtractedMeSHin docs Cells Ficus (becauseof «fig») Patients Time Genes Therapeutics Methods Role Humans Disease Volition Mice Attention DNA Population ExtractedMeSHin topics Women History Pain Blood Physical Examination Female Blood Pressure Pressure Dyspnea Family Thorax Urine Fever Male Emergencies Top 15 MeSHin benchmark
  • 12. Resultsfor MeSHrepresentation •Best R-Prec0.143 for MeSHrepresentation(vs 0.211 for text) oMeSHconcepts collectedfromMEDLINE not useful(best R-Prec0.028) oOnly 53%of documents had MeSHterms in MEDLINE •Complementarityfor findingrelevant documents (thanksto qrel) : •Lowcomplementarity •Combination: 0.211 -> 0.213
  • 13. Favoringtargettypes MeSHfor PMC 2649306 D008569 Memory Disorders D001921 Brain D001284 Atrophy D005911 Gliosis D001706 Biopsy MeSHtargetDiagnosis MeSHtargetDiagnosis MeSHtargetTest Do relevant documents for diagnosisdeal more withdiagnosis? 3. Target-specific semantic enrichment with MeSH
  • 14. 3. Target-specificsemanticenrichmentwithMeSH •In UMLS, eachMeSHtermhas SemanticTypes(ex: T060 Diagnostic Procedure) Focus on targets(diagnosis, treatmentsand tests) •Specificwords(ex: «MeSHtargetDiag») are addedin docs and queries Target % docs thathave at least1 Averagenumberin documents Test 83 % 16 Diagnosis 86 % 41 Treatment 86 % 24 Small improvementonlyfor section indexing
  • 15. In the qrel… Set Aver. DiagnosisMeSH Aver. Test MeSH Aver. TreatmentMeSH All collection 41 16 24 Relevant for diagnosis (1|2for queries1..10) 108 41 41 Relevant for test (1|2for queries11..20) 107 41 33 Relevant for treatment (1|2for queries21..30) 114 47 52 All relevant documents: oAre quitesimilar, withno distinction betweentargets oBut have 2/3 times more targetMeSHterms o... but it’salsothe case for documents with0 in the qrel
  • 16. 4. Boostingbasedon article types Promotingsomearticle types Are somearticle types more likelyto berelevant ?
  • 17. Article type Distribution in docs in qrel in our runs research-article 74.3 % 52.2% 37.9 % case-report 4.0 % 20.4 % 41.5 % review-article 6.9 % 17.9 % 10.9 % Other 2.6 % 3.2 % 3.6 % brief-report 1.1 % 1.5 % 0.9 % 4. Boostingbasedon article types •Strategy: to promotereviewand case-basedarticles (boosting) •Intuition wasgood… •In reality… the IR enginealreadypromotedthesetypes ! but the strategyfailed! Top 5
  • 18. 5. Exploitation of the co-citations network Promotingcitations Are citations of retrieveddocuments relevant ?
  • 19. 5. Exploitation of the co-citations network •E isthe set of retrieveddocuments •RSVeisthe RetrievalStatusValue of doce •Weboosteachcitation of doceby + αx RSVe •50% of documents cite anotherone in the collection (avg3.8 cits)
  • 20. Results •Withα= 0.1, slightimprovement •+ 10% for R-PREC •+ 20% for infNCDG •In TREC Chem2010 Prior Art task, + 150% for MAP
  • 21. Conclusions “what is important is to have fought well”
  • 22. Conclusions •A lot of strategies, but not muchbetterthanTerrier baseline •Section indexing: neveragain •MeSHnot complementary… Betterwheninferedby a k-NN ? •Relevant docs talk about test, diagand treatmentaltogether. •Maybewehave to startworkingfromthe baselinerun…