An ontology driven module for accessing chronic pathology literature- CHRONIOUS-Swws2011
1. An ontology driven module
for accessing chronic
pathology literature
Riccardo Albertoni,
Institute for Applied Mathematics and Information Technology
C.N.R., Genova, Italy
Joint work with Stephan Kiefer, Jochen Rauch, Marco Attene,
Franca Giannini, Simone Marini, Luc Schneider, Carlos
Mesquita, Xin Xing
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
2. Motivation, problem area
Chronic diseases are the leading causes of death and disability
for a large amount of people in most industrialized nations.
Chronic diseases have a deep impact on today’s society costs:
1. Costs of medical care in relation to diagnosis and treatment of
disease
2. Loss of human resources caused by morbidity or premature death
3. Intangible costs capture the psychological dimensions of illness
including pain and anxiety
New technologies for acquiring and analyzing vital signals
are arising
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
3. Motivation, problem area
New monitoring and treatment of chronic
patients are becoming possible
No already existent guidelines are available
Knowledge in the domain is rapidly evolving
Need of tools for indexing and retrieving well
focused documentation accordingly to
continuously evolving knowledge
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
4. Context: the Project
An Open, Ubiquitous and Adaptive Chronic Disease Management
Platform for Chronic Obstructive Pulmonary Disease (COPD) and
Chronic Kidney disease (CKD),
FP7-ICT-2007–1– 216461 CHRONIOUS, February 2008 – January
2012 (48 months) http://www.chronious.eu/
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
5. Context: The CHRONIOUS
project
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
6. Literature search module:
design requirements
Using explicit medical terminology (e.g., controlled vocabularies)
Specific to the considered pathologies, but also terminology allowing different
levels of granularity in search (searching by coarse- and fine- grained concepts)
Terminology as much as possible Modular and Extendable
CKD and COPD are a kind of test-bed for Chronious, but other chronic diseases
should be pluggable eventually
Knowledge in these domains evolves, so do related terminologies! we should
support in keeping terminologies up-to-date
Offering multilingual capabilities
Search must be possible in different languages, at least when well established
translations of terminologies are available
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
7. The developedsystem
Document
upload
Automatic
Upload tool Document processing
Manual Format
NLP Concept
upload tool Transformation
Associator
CKD COPD MeSH
Ontology ontology thesaurus
OWL/RDF SKOS/RDF
enrichment
tool mapping API Indexer
Document
Search Conceptual search
Metadata search
Free Text search NLP
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
8. Terminology to index scientific
literature COPD, CKD Ontologies
Medical Subject Headings (MeSH) is a well known controlled
vocabulary used for indexing articles Layer Ontology for Clinical Care (MLOCC)
Middle from MEDLINE/PubMed
it isn’t enough specialized to deeply cover COPD and CKD domains
Ontologies have been defined to deepen COPDBiological and Biomedicalby
Open and CKD diseases (OWL
IFOMIS) Ontology (OBO) Foundry:
Basic Formal Ontology (BFO) +
However MeSH is still required in Chronious Relation Ontology (RO) +
Foundational Model
The search is not always made at the same level of granularity, often keyword
search can be done moving back and forwardof Anatomy Ontology (FMA)
from coarse to very disease-
specialized concepts
Multilingual support, some “certified” translations are available for example in
Italian, Portuguese, Spanish
Terminological de facto standard, clinicians expect MeSH is included
How to combine ontologies and MeSH in CHRONIOUS ?
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
9. The adopted approach
RDF URI as a kind of lingua franca
MeSH (provided by the US medical library) has been encoded
in SKOS/RDF (W3C)
Italian, Portuguese and Spanish translations of MeSH
(provided by national authorities) have been encoded in
SKOS/RDF
We kept RDF ID consistent to the original MESH descriptor
identifiers
A semi-automatic mapping between MeSH in SKOS and
developed Ontologies
A script compares MeSH terms with lexical representation of
concepts form OWL ontologies
The suggested mapping are validated in two stages-process by
Ontology Engineers and Clinicians
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
10. Natural Language Processing
Based on General Architecture for Text Engineering (GATE) framework
Open Source, JAVA suite, originally developed at the University of Sheffield beginning
in 1995
used worldwide by a wide community of scientists, companies, teachers and students
for all sorts of natural language processing tasks, including information extraction in
many languages
Default processes applied to extract headwords of a text:
Sentence splitter, Tokeniser, Part-of-speech tagger, Morphological analyzer
Modules included for the Ontology Enrichment Tool and the Indexer
OntoRootGazetteer: A GATE plug-in that produces ontology aware annotations for
extracted terms;
Shallow Parser: it identifies word groups such as “chronic diseases” and “lung
function”;
RegEx-Pattern Matcher: it matches a lemma of a token with word patterns defined as
regular expression;
Thesaurus Matcher: it matches the lemma of a token to a domain thesaurus, a JAPE
resources has been developed to access MEsH and the mapped ontology concepts
through MeSH Mapping API
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
11. Ontology enrichment
Semi-automatic process - candidate concepts are rated according to
Corpus relevance:determined by its average Term Frequency Inverted
Document Frequency (TF.IDF) value with respect to the whole document
corpus;
Concept co-occurrences average distance in the text between the
candidate concept and a concept within the corpus is calculated as a
benchmark.
Domain relevance: matching with common dictionary (WordNet), domain
thesaurus (MeSH) and with regular expression patterns;
Subclass-of relations: extraction of vertical relations, linguistic patterns or
dictionary hypernyms.
Candidateconcepts are marked as “new”, “to validate”, “postponed”,
20 October 2011 “accepted” or “rejected” by ontology engineers and clinicians
SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
14. Search results:
(black Box Testing) intial
comparison with PubMed
“Inhaler Device”
“Inhaler device”
“PostBronchodilatorSpirometry
Horizontal axis: Number of considered/retrieved documents
Vertical axis: F-measure
Also glass box testing has been performed to ensure ontologies
represent therightconcepts
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
15. Conclusions
The CHRONIOUS search system can become a specialized
indexing and search system for the hospital:
It can manage internal hospital documents to be
indexed into the database
It can cover other medical domains and languages
using MeSH
It is already specialized in COPD and CKD by using
specific ontologies
It provides the tools for ontology maintenance thus well
suited to domains characterized by rapidly evolving
knowledge
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
16. Critical and open issues
for future work
User notifications
• about changes in Enrichment Tool data (e.g. if new
documents with extracted candidate concepts are
available)
•supporting the collaboration among clinicians and
Ontology Engineers
Re-indexing of documents
•what happens when there is a new ontology version?
• some incremental indexing should be provided
20 October 2011 SWWS-OTM2011 Copyright 2011 CHRONIOUS -- IMATI-CNR
Editor's Notes
In this respect, the European community has funded the Chronious project, which include an European wide consortium (14 partners) aims at developing a …