2. Topics for today
The Research Symbiosis
Some Integration Projects Leveraging
Ontologies
A more complete research profile – integrating
research resources and person information
Improving query across multiple biospecimen
repositories
Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
4. We’ve all been here before:
Ontologies can help us do better.
OMIM Query # of records
“large bone” 1032
"enlarged bone" 207
"big bones" 22
"huge bones" 4
"massive bones" 39
"hyperplastic bones" 12
"hyperplastic bone" 44
"bone hyperplasia" 173
"increased bone growth" 836
5. Why not just map to ontology terms?
Class A Class B Mapped? Useful?
FMA: extensor
retinaculum of wrist
MouseAnatomy: retina Yes No
Vivo: legal decision Cognitive Atlas: decision Yes No
PlantOntology: Pith MouseAnatomy: medulla Yes No
TaxRank: domain NCI: protein domain Yes No
ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes
FMA: tibia FlyAnatomy: tibia Yes No
FMA: colon GAZ: Colón, Panama Yes No
Quality: male Chebi: maleate 2(-) Yes No
Mapping requires manual work to perform and maintain; string
matching for mapping can lead to spurious results; semantics of
mappings and provenance are not always clear
6. Topics for today
The Research Symbiosis
Some Integration Projects Leveraging
Ontologies
A more complete research profile – integrating
research resources and person information
Improving query across multiple biospecimen
repositories
Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
7. CTSAconnect:
A Linked Open Data
approach to represent
clinical and research
expertise, activities, and
resources
CTSA 10-001: 100928SB23
PROJECT #: 00921-0001
9. About VIVO
Primarily focused on people, activities, and
outcomes typically associated with research
networking
Eager to represent more diverse components of
expertise, across domains
e.g., exhibits, performances, specifics about research
Had worked with core facilities at Cornell to
represent labs, equipment, and services
Started collaborating with eagle-i to go further
with research resources
11. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
And then was born the
“CTSAconnect” project
Ok, so it is perhaps not a very informative name for an effort to
consolidate researcher, research activities, and research
resource representation, but what else are we going to call it?
ARG! The Agents, Resources, and Grants ontology
12. ISF Content and modularization
eagle-I
Research resources
VIVO
Person profiling
ShareCenter
Discussions, requests,
share documents
ISF
Contact Organizations
Affiliations
Services Events
Clinical
Expertise
Reagents
Organisms
Credentials
13. ISF Modularization
Constraints
• Different ontology modeling principles
• Active ongoing development of eagle-i and VIVO applications
• Investments in existing RDF datasets and the need for stable
targets
Benefits
• Flexibility in what modules to populate at a given site
• Extensibility as needs and feedback influence future evolution
14. Annotation view with approved or pending approval.
Module view shows pending axiom changes per module and has ability to save the
changes with a log comment, and generate the spreadsheet summary
Protégé refactoring plugin
18. Building translational teams
We want to assemble teams of scientists to
examine, for example, specific drugs released
for repurposing
Hard to identify and connect complementary
basic and clinical expertise across disciplines
19. Bringing together clinical expertise
and basic science expertise
Representation of a clinician expertise extracted
From ICD-9 codes for
Basic Researcher with Similar
Expertise based on MeSH Terms
Resources
a resource related to Autoimmune disease
21. Topics for today
The Research Symbiosis
Some Integration Projects Leveraging
Ontologies
A more complete research profile – integrating
research resources and person information
Improving query across multiple biospecimen
repositories
Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
22. OHSU’s Biolibrary and Search Engine
Data aggregated from two repositories:
– Department of Pathology repository (600K)
– Knight Cancer Institute repository (16K)
A web-based search engine over de-identified
data
Our group is applying semantic informatics to
improve
– Data format and quality
– Data integration across the two repositories
– Search capabilities
Funded by Medical Research
Foundation of Oregon
23. Opportunities for improving the
Biolibrary data
Limited anatomical data
– Cancer registry table has 300+
anatomical entities
– Pathology table only 86
– 99% of pathology reports (600K)
have no anatomical codes
– No anatomical relationships
– Coded sites are not as specific as
descriptions in the pathology
reports
27. Extracting ontology concepts
Pathology reports were the main focus
– Main source of data in the current system
– Contain richer information
NLP tools were used to identify concepts
Existing ontology resources were used to
add semantics
29. Structured data vs. pathology report
(about 7K cases)
However, pathology report also includes:
• Low grade pancreatic intraepithelial neoplasia
• Extensive perineural invasion
• Acute and chronic cholecystitis
• Bile duct tissue with chronic inflammation
• Chronic pancreatitis
• Acute gastric serositis
Available structured data from one case:
30. Adding Logical Relationships
About 400 anatomical entities were mapped
to the Foundational Model of Anatomy
An additional 300 to SNOMED
Used the is_a and part_of relations
Re-represented this in a semantic and
computable format
Allows for semantic queries
31. Considerations
Concept mapping helps with document retrieval
Does not necessarily imply a fact
– Negation
– Differential diagnosis
– Past case history
Researchers will likely need aggregated facts
from multiple sources to support real research
queries
Information extraction options are being
explored as part of this work
32. Topics for today
The Research Symbiosis
Some Integration Projects Leveraging
Ontologies
A more complete research profile – integrating
research resources and person information
Improving query across multiple biospecimen
repositories
Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
36. Ontologies as a tool for unification
Disease-
Phenotype
databases
Disease
phenotype
ontology
Expression
data
Gene function
data
Cell and tissue
ontology
GO
annotations
ontologies
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: tool
for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556
37. Yet problems remains
Incomplete
data
Not connected
ontology
Missing & incorrect
annotations
Multiple
Overlapping
Ontologies
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
Annotations
miss the important
biology
38. Ontologies built for one species will
not work for others
http://fme.biostr.washington.edu:8080/FME/index.html
http://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html
39. Uberon: a multi-species anatomy
ontology
• Contents:
– Over 8,000 classes (terms)
– Multiple relationships, including subclass, part-of and
develops-from
• Scope: metazoa (animals)
– Current focus is chordates
– Federated approach for other taxa
• Uberon classes are generic / species neutral
– ‘mammary gland’: you can use this class for any mammal!
– ‘lung’: you can use this class for any vertebrate (that has
lungs)
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative
multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
http://genomebiology.com/2012/13/1/R5
40. Bridging anatomy ontologies
ZFA
MA FMA
EHDAA2EMAPA
Uberon
CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.
Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5
SNOMED
NCIt
GO
CL
42. Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011).
vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5.
http://bgee.unil.ch
43. Evo-devo applications
Dahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708.
doi:10.1371/journal.pone.0010708
44. The Monarch Initiative
The model systems research network
We are under construction
Goals are to:
Aggregate model systems genotype
and phenotype information
Integrate with network, genomic, and
functional data
Leverage ontologies for phenotype
similarity matching
Build knowledge exploration tools for
end users
Build services for other applications
Funded by NIH # 1R24OD011883-01
45. Can we search by phenotype alone?
Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-Based
Phenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247
http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247
47. Organism Genotype
zebrafish fgf8ati282a/+
; shhatq252/tq252
(AB)
mouse B6.Cg-Shhtm1(EGFP/cre)Cjt
/J
worm daf-2(e1370) III; fog-2(oz40) V
human* ATP1A3(NM_152296.3)
[c.946G>A, p.Gly316Ser]+[=]
But..different organisms record
genotypes differently
Phenotypes can be attached to full or partial
genotypes, alleles, or variants
48. Model systems
phenotype and
genotype data
Pulling it together
NIF DISCO
Data ingest Ontology annotation
OWLSIM
Enabling phenotype-based knowledge discovery tools
ONTOQUEST
Extensible Web resource
DISCOvery, registration and
interoperation framework
MONARCH tools and services
51. Conclusions
Ontologies have provided us the capability to
integrate a variety of biomedical data, at
different levels of granularity, from different
applications, and across domains
Describing biology works best with multiple
connected ontologies
We need smart data, not just big data
We need better tools to integrate multiple
ontologies
We need better tools to make use of smarter
data structures (e.g. reasoning costs)
52. Monarch Initiative
CTSAconnect
Biospecimen Ontology
OHSU
Melissa Haendel
Carlo Torniai
Nicole Vasilevsky
Chris Kelleher
Shahim Essaid
Cornell University
Dean Krafft
Jon Corson-Rikert
Brian Lowe
University of Florida
Mike Conlon
Chris Barnes
Nicholas Rejack
OHSU
Melissa Haendel
Shahim Essaid
Carlo Torniai
OHSU
Melissa Haendel
Carlo Torniai
Shahim Essaid
Nicole Vasilevsky
Scott Hoffman
Matt Brush
LBNL
Chris Mungall
Suzi Lewis
Nicole Washington
UCSD/NIF
Maryann Martone
Anita Bandrowski
Jeff Grethe
Amarnath Gupta
Stony Brook University
Moises Eisenberg
Erich Bremer
Janos Hajagos
Harvard University
Daniela Bourges
Sophia Cheng
University at Buffalo
Barry Smith
Dagobert Soergel
Zaloni
Will Corbett
Ranjit Das
Ben Sharma
University of Pittsburgh
Harry Hochheiser
Chuck Borromeo
Notas del editor
How can we:Help science be more reproducible?Provide access to resources and expertise?Give credit where credit is due? Make data more interoperable and visible?Make science more efficient?The projects that I am going to talk about all involve supporting and strengthening these connections
Need different figure
ChrisMungallA genome is a genome, whether it’s an amoeba or a human. Tweets.Mention models hereEnvironment too
ChrisMungallWhat tends to happen is that multiple non-interoperable
Chris Mungall
Chris MungallHumans must manually integrateMachines can’t make sense of this alone
Images: Seth Ruffins
Icbo paper.
Multiple coordinated: need to describe cell and tissue context