A wish list for tools for modularity support in bio-ontology engineering based on the ChEBI ontology requirements. Presented at the workshop on modular ontologies, WoMO, 2011, in Ljubljana.
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Modularity requirements in bio-ontologies: a case study of ChEBI
1. Janna Hastings,
Colin Batchelor,
Stefan Schulz,
Christoph Steinbeck
Modularity requirements in bio-ontologies
a case study of ChEBI
Workshop on Modular Ontologies, ESSLLI,
12 August 2011 EBI is an Outstation of the European Molecular Biology Laboratory.
2. ChEBI:
an ontology of biologically interesting chemicals
ChEBI Ontology
chemical entity role
chemical substance biological role
molecular entity application
group chemical role
carbonyl compound pharmaceutical
solvent
carboxy group carboxylic acid
antibacterial drug
cyclooxygenase
has part inhibitor
has role
cefpodoxime (CHEBI:606443)
2 22.02.2012
3. Bio-ontologies are modular by design:
domain and granularity
Domain Chemistry
Granularity Upper level type
Material entities
Molecular
entities
Functions and
roles of
Substances chemical entities
3 22.02.2012 ChEBI ontology
4. They are characterised by large sizes
and low expressivity
Currently
Chemical entities
exported (29132)
in EL++ Roles (596)
Subatomic
particles (41)
August 2011 29769 classes in total
4 22.02.2012 ChEBI ontology
5. Classification practices in chemistry lead to
high levels of multiple inheritance
5 22.02.2012 ChEBI ontology
6. ChEBI is growing
bigger …
… and more expressive
6 22.02.2012 ChEBI ontology
Image credit: Jonathan J. Dickau
7. Increased expressivity to enable automatic
classification
hydrocarbon equivalentTo
molecule and has_atom only
(carbon atom or hydrogen atom)
peptide cation equivalentTo
peptide and has_charge some double [>, 0.0]
7 22.02.2012 ChEBI ontology
8. carboxylic acid equivalentTo
molecule and has_functional_group some
carboxy group
tricarboxylic acid equivalentTo
molecule and has_functional_group exactly 3
carboxy group
8 22.02.2012 ChEBI ontology
10. Reasoning is required for classification and
consistency validation
No definitional cycles
A part_of B part_of C part_of A
Enforcing disjointness
Chemical Entity disjoint_from Role …
Group disjoint_from Molecule …
No disallowed combinations of relations
A has_part B ; A conjugate_base_of B
10 22.02.2012 ChEBI ontology
11. Reasoning time in seconds
Number of fully defined classes
11 22.02.2012 ChEBI ontology
12. Modularity and large ontologies
smaller modules = faster classification
12 22.02.2012 ChEBI ontology
13. A USEFUL module for maintenance
… is delineated by topic
… is comprehensible and
easy to work with
… is self contained for
reasoning tasks
13 22.02.2012 ChEBI ontology
15. Self-contained modules
include all axioms needed for
classification and consistency checking
upper-level
properties constraints
(e.g. disjointness)
parts
hierarchy
15 22.02.2012 ChEBI ontology
16. Ontology segmentation tools
don’t work very well on ChEBI
… yet
Topic blind
Modules too small Out of memory
or too big Long processing times
No tool support
for recombined
viewing/querying
16 22.02.2012 ChEBI ontology
18. The MIREOT mechanism requires
manual selection of module content
and manual update of ontology changes
Build
ontology
Choose links
terms
Extract
module
18 22.02.2012 ChEBI ontology
19. We need modular ontology views
Automatic module extraction View V1
based on selection criteria (Topic,
Editing)
Edit, Validate,
write back to source
Ontology O
19 22.02.2012 ChEBI ontology
20. Views can be imported and are
then automatically updated
Module extraction View V1
(Topic, Import of views
Editing)
Ontology O2
Ontology O1 (e.g. biology)
(e.g. chemistry) Automatic
update
20 22.02.2012 ChEBI ontology
21. How do we facilitate
the development of tools
for modular
ontology engineering?
21 22.02.2012 ChEBI ontology
22. 22.02.2012
Thank you
Acknowledgements: BBSRC (funding)
22
ChEBI ontology EBI is an Outstation of the European Molecular Biology Laboratory.
Notas del editor
29769 classes in latest OWL file release Of these, 28875 are descendents of chemical entity, 596 are roles 41 subatomic particles and 257 are chemical entities not classified as chemical entities, thus, the real count for chems is 29132
Higher expressivity is not necessarily required for question answering, since the inferred hierarchy can be exported to OWL-EL for question answering.
I am coming from the software engineering perspective in this talk. Modularity is a tool to design complex systems while focusing on local organisation.
Tools are needed which are able to perform modularization of existing ontologies for purposes of ease of maintenance, then recombination for query answering.Shared terms between modules (represented only once)good way of thinking about it: Modular VIEWS on the overall ontologyAlso the ability to extract modules for import into other ontologies
Tools for modularization of existing ontologies for purposes of ease of maintenance, then recombination for query answering Shared terms between modules (represented only once)good way of thinking about it: Modular VIEWS on the overall ontologyAbility to extract modules for import into other ontologies,