Selaginella: features, morphology ,anatomy and reproduction.
BiTeM / SIBTex @ TREC CDS 2014
1. Full-texts representation with MeSH, co-citations network reranking
BiTeM/SIBtexgroup
J Gobeill (me), A Gaudinat, E Pascheand P Ruch
University of Applied Sciences,
SwissInstitute of Bioinformatics,
Hospitalsand University of Geneva
2. The BiTeM/ SIBtexgroup
•TextMining and Bibliomics(P Ruch)
Strongfocus on clinicaland biologicaldata
heg(training librarians) and SIB (assistingbiocurators)
•Long historyof participation in TREC campaigns
Genomics, Chemical IR, MedicalRecords…
•Translationalmedicineprojects(EU FP7 Programme)
Khresmoi: multimodal medicalsearchengine
MD-Paedigree: retrievalof similarcases for clinicians
3. The CDS Track2014
•ClinicalDecisionSupport : « retrieval of biomedical articles relevant for answering generic clinical questions about medical records »
Ex. query: « 25-year-old woman with fatigue,hairloss,
weight gain, and cold intolerance for 6
months»
Collection: subsetof PubMedCentral
11. MEDLINE MeSHin docs
Humans
Animals
Female
Male
Adult
Middle Aged
Mice
Aged
Adolescent
Molecular Sequence Data
Rats
Young Adult
Time Factors
Child
Signal Transduction
ExtractedMeSHin docs
Cells
Ficus (becauseof «fig»)
Patients
Time
Genes
Therapeutics
Methods
Role
Humans
Disease
Volition
Mice
Attention
DNA
Population
ExtractedMeSHin topics
Women
History
Pain
Blood
Physical Examination
Female
Blood Pressure
Pressure
Dyspnea
Family
Thorax
Urine
Fever
Male
Emergencies
Top 15 MeSHin benchmark
12. Resultsfor MeSHrepresentation
•Best R-Prec0.143 for MeSHrepresentation(vs 0.211 for text)
oMeSHconcepts collectedfromMEDLINE not useful(best R-Prec0.028)
oOnly 53%of documents had MeSHterms in MEDLINE
•Complementarityfor findingrelevant documents (thanksto qrel) :
•Lowcomplementarity
•Combination: 0.211 -> 0.213
13. Favoringtargettypes
MeSHfor PMC 2649306
D008569 Memory Disorders
D001921 Brain
D001284 Atrophy
D005911 Gliosis
D001706 Biopsy
MeSHtargetDiagnosis
MeSHtargetDiagnosis
MeSHtargetTest
Do relevant documents for diagnosisdeal more withdiagnosis?
3. Target-specific semantic enrichment with MeSH
14. 3. Target-specificsemanticenrichmentwithMeSH
•In UMLS, eachMeSHtermhas SemanticTypes(ex: T060 Diagnostic Procedure)
Focus on targets(diagnosis, treatmentsand tests)
•Specificwords(ex: «MeSHtargetDiag») are addedin docs and queries
Target
% docs thathave at least1
Averagenumberin documents
Test
83 %
16
Diagnosis
86 %
41
Treatment
86 %
24
Small improvementonlyfor section indexing
15. In the qrel…
Set
Aver. DiagnosisMeSH
Aver. Test MeSH
Aver. TreatmentMeSH
All collection
41
16
24
Relevant for diagnosis
(1|2for queries1..10)
108
41
41
Relevant for test
(1|2for queries11..20)
107
41
33
Relevant for treatment
(1|2for queries21..30)
114
47
52
All relevant documents:
oAre quitesimilar, withno distinction betweentargets
oBut have 2/3 times more targetMeSHterms
o... but it’salsothe case for documents with0 in the qrel
16. 4. Boostingbasedon article types
Promotingsomearticle types
Are somearticle types more likelyto berelevant ?
17. Article type
Distribution
in docs
in qrel
in our runs
research-article
74.3 %
52.2%
37.9 %
case-report
4.0 %
20.4 %
41.5 %
review-article
6.9 %
17.9 %
10.9 %
Other
2.6 %
3.2 %
3.6 %
brief-report
1.1 %
1.5 %
0.9 %
4. Boostingbasedon article types
•Strategy: to promotereviewand case-basedarticles (boosting)
•Intuition wasgood…
•In reality…
the IR enginealreadypromotedthesetypes !
but the strategyfailed!
Top 5
18. 5. Exploitation of the co-citations network
Promotingcitations
Are citations of retrieveddocuments relevant ?
19. 5. Exploitation of the co-citations network
•E isthe set of retrieveddocuments
•RSVeisthe RetrievalStatusValue of doce
•Weboosteachcitation of doceby + αx RSVe
•50% of documents cite anotherone in the collection (avg3.8 cits)
20. Results
•Withα= 0.1, slightimprovement
•+ 10% for R-PREC
•+ 20% for infNCDG
•In TREC Chem2010 Prior Art task, + 150% for MAP
22. Conclusions
•A lot of strategies, but not muchbetterthanTerrier baseline
•Section indexing: neveragain
•MeSHnot complementary… Betterwheninferedby a k-NN ?
•Relevant docs talk about test, diagand treatmentaltogether.
•Maybewehave to startworkingfromthe baselinerun…