ABSTRACT: Ontologies can provide a conceptualization of a domain leading to a common vocabulary for communities of researchers and important standards to facilitate computation, software interoperability and data reuse. Most successful ontologies, especially those that have been developed by diverse communities over long periods of time, are typically large and complex. To address this complexity, ontology authoring and browsing tools must provide cognitive support to improve comprehension of the many concepts and relationships in ontologies. Also, ontology tools must support collaboration as the heart of ontology design and use is centered on community consensus.
In this talk, I will describe how standardized ontologies are developed and used in the biomedical and clinical domains to aid in scientific and medical discoveries. Specifically, I will present how the US National Center for Biomedical Ontology has designed the BioPortal ontology library (and associated technologies) to promote the use of standardized ontologies and tools. I will review how BioPortal and other ontology tools use established and novel visualization and collaboration approaches to improve ontology authoring and data curation activities. I will also discuss an ambitious project by the World Health Organization that leverages the use of social media to broaden participation in the development of the next version of the International Classification of Diseases. To conclude, I will discuss the challenges and opportunities that arise from using ontologies to bridge communities that manage and curate important information resources.
SLE 2012 Keynote: Cognitive and Social Challenges of Ontology Use in the Biomedical Domain
1. Cognitive and Social Challenges of
Ontology use in the Biomedical Domain
SLE 2012: 5th International Conference on
Software Language Engineering
Dresden, Germany
Margaret-Anne Storey
The CHISEL Group, University of Victoria
12. Ontology languages
Choice of language and choice of reasoning
engine
Tradeoff between expressiveness, reasoning
power, tractability and human understanding
May need inference engine to give real-time
feedback while authoring an ontology
18. Challenges?
Cognitive issues:
– Complexity, scale
– Evolution
– Inclusion of “upper ontologies”, or
parts of other ontologies
Social issues:
– One size does not fit all
– Multiple authors
– Input from broader set of stakeholders
20. Foundational Model of Anatomy
(FMA)
Comprehensive ontology of
human anatomy
Over 120K terms, 2.1M
relationship instances (168
relationship types)
One of the largest and best
developed ontologies in
biomedicine, multi-purpose
Slide by Mark Musen.
22. Gene Ontology (GO)
To unify representation of gene and gene
product attributes across all species
For annotating genes and gene products,
assimilate and disseminate annotation data
Contains over 24,500 terms applicable to a
wide variety of biological organisms
A standard tool in bioinformatics
24. International Classification of Diseases (ICD)
• An enumeration of diseasesthat forms the
basis for medical claims and reimbursements
• A “legacy” terminology that has its roots in
19th century epidemiology
• Created initially by biostatisticians with a
pressing need to compare death statistics in
different European countries
Slide by Mark Musen.
25. ICD is used for lots of (too many?) things!
• ICD is used to code all patient encounters
with the health-care system for:
– Billing and reimbursement
– Institutional planning
– Disease surveillance and public health
– Quality assurance
– Economic modeling
• ICD was never intended to make the
distinctions relevant to all these tasks!
• Nevertheless it is widely used!
Slide by Mark Musen.
26. ICD: An excerpt…
724 Unspecified disorders of the back
724.0 Spinal stenosis, other than cervical
724.00 Spinal stenosis, unspecified region
724.01 Spinal stenosis, thoracic region
724.02 Spinal stenosis, lumbar region
724.09 Spinal stenosis, other
724.1 Pain in thoracic spine
724.2 Lumbago
724.3 Sciatica
724.4 Thoracic or lumbosacral neuritis
724.5 Backache, unspecified
724.6 Disorders of sacrum
724.7 Disorders of coccyx
724.70 Unspecified disorder of coccyx
724.71 Hypermobility of coccyx
724.71 Coccygodynia
724.8 Other symptoms referable to back
724.9 Other unspecified back disorders
Slide by Mark Musen.
27. ICD9 (1977): A handful of codes for
traffic accidents
Slide by Mark Musen.
28. ICD10 (1999): 587 codes for such accidents
V31.22 Occupant of three-wheeled motor vehicle injured
in collision with pedal cycle, person on outside of
vehicle, nontraffic accident, while working for income
W65.40 Drowning and submersion while in bath-tub,
street and highway, while engaged in sports activity
X35.44 Victim of volcanic eruption, street and highway,
while resting, sleeping, eating or engaging in other vital
activities
Slide by Mark Musen.
29. ICD revision process in the 20th Century…
• International and National Revision conferences
• 1-5 person delegations in International
conferences, multi-disciplinary
• Manual curation
• Output: paper copy
• Negotiation process: decibel method of
discussion
• ICD drafts translated into 27 languages
See http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2950305/
30. ICD-11 revision: key aspects
• Content model
• Topic Advisory Groups – vertical and horizontal
• Classification experts (ontology development)
• iCAT: web based collaborative authoring tool
• Use cases – evaluating ICD-11 in use
31. Deliverables
• Print versions –fit for purposein multiple
languages
• Web portal to access, browse and maintain it
– Input from the crowd
• Classification in formalized language
34. Protégé ontology authoring environment
Ontology contents need to
be processed and interpreted
by computers
Interactive tools can assist
developers in ontology
authoring (e.g. Protégé)
39. National Center for Biomedical Ontology
Goal: develop innovative technology and methods that
allow scientists to record, manage, and disseminate
biomedical information and knowledge in both
human readable and machine-processableform
57. Data from STRIDE
• 1.8 million pediatric and adult patients with clinical and
demographic data (1994 - present)
• 19 million Clinical Encounters (1994 - present)
35 million
22 million
2.9 million
1.2 million
7 million
137 million
10 million
Slide by Nigam Shah.
58. Making EMRs Unreasonably Effective
Text clinical note
BioPortal – knowledge graph
Creating clean lexicons
Diseases Frequency Term – 1
: Term recognition tool
: NCBO Annotator
Procedures
: Annotation Workflow
Syntactic types Term – n
Drugs
Terms Recognized
P1 ICD9 ICD9 ICD9 ICD9 ICD9 ICD9
P1 T1, … T5, … T4, T8, … T6, T1,
Further Analysis
T2, T4, T3, T9, T8, T2,
no T4 T3 T1 T4 T10 no T4
P2
P2
P3 Negation detection
Cohort of
Interest
P3
:
:
Pn
Pn
Terms form a temporal series of tags
Slide by Nigam Shah.
73. Towards collaborative ontology
visualization as a service
• Preserve easy-to-use visualizations of
ontologies
• Enable flexible visual exploration and analysis
of biomedical ontologies and data
• Support collaboration in visual exploration and
analysis of biomedical ontologies and data
• Enable presentation of analysis artifacts on the
web
84. Concluding remarks
• Ontologies finally coming of age(D. McGuinness)
• With adoption, novel tools will emerge
• Users now savvy with search, visualization and
analytics
• Anticipated benefits for translational research
“Developers do not innovate tools, users do”
85. Selected References
Gruber T., A Translation Approach to Portable Ontologies. Knowledge Acquisition 5(2):199-220, 1993
ICD11: http://www.youtube.com/user/whoicd11
Ernst, N.A., Storey, M.A., Allen, P.: Cognitive support for ontology modeling. International Journal of
Human-Computer Studies 62(5), 553–577 (2005)
Fu, B., Grammel, L., Storey, M.A.: BioMixer: A Web-based Collaborative Ontology Visualization Tool. 3rd
International Conference on Biomedical Ontology (ICBO 2012) (2012)
Gruber, T.R.: Towards Principles for the Design of Ontologies Used for Knowledge Sharing. In: Guarino,
N., Poli, R. (eds.) Formal Ontology in Conceptual Analysis and Knowledge Representation. vol. 43,
pp. 907–928. Kluwer Academic Publishers (1993)
Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., Giannopoulou, E.: Ontology visualization methods—
a survey. ACM Computing Surveys 39(4) (2007)
Musen, M.A., Noy, N.F., Shah, N.H., Chute, C.G., Storey, M.A., Smith, B., Team, the NCBO: The National
Center for Biomedical Ontology. Journal of the American Medical Informatics Association (In press.)
(2012), http://bmir.stanford.edu/file_asset/index.php/1729/BMIR-2011-1468.pdf
Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D.L., Storey, M.A.,
Chute, C.G., Musen, M.A.: BioPortal: ontologies and integrated data resources at the click of a
mouse. Nucleic acids research 37(Web Server issue) (Jul 2009)
Smith, B.: Ontology (Science). Nature Precedings (i) (Jul 2008)
Tudorache, T., Falconer, S., Noy, N., Nyulas, C., ¨Ust¨un, T., Storey,M.A., Musen, M.: Ontology
development for the masses: creating ICD-11 in WebProt´eg´e. In: Knowledge Engineering and
Management by the Masses, EKAW2010. pp. 74–89. Springer (2010)
Editor's Notes
Useful for independent explorations and comparisonsCould include video of demo here.
Couldlabel the data graph (make it clear that it is our internal datastructure, which runs on the client (and the actual data can be different on each client), and we build it by calling the REST services)