Keynote presented at the Semantic Web for Life Sciences conference in Cambridge, UK, December 9th, 2015
http://www.swat4ls.org/
The talk focuses on the use of ontologies for data integration to support rare disease diagnostics, and how so very many people unbeknownst to the patient or even to the researchers creating the data are involved in a diagnosis.
Formation of low mass protostars and their circumstellar disks
Envisioning a world where everyone helps solve disease
1. @monarchinit @ontowonka
“Not everyone can become a great
artist, but a great artist can come from
anywhere”
Anton Ego, Ratatouille, 2007, Dixsney/Pixar
Envisioning a world where everyone helps
solve disease
Melissa Haendel
SWAT4LS 2015
Cambridge, England
2. Faith-based research
“I believe that my work
on some obscure cell
type in some obscure
organism will matter to
mankind one day”
Well, it can, and it does.
3.
4. Four things it takes to solve an
undiagnosed disease
1. Deep phenotyping the human organism
1. Crossing the language barrier
1. A lot of data from a lot of places
1. Very many people (who have faith)
11. Can we help machines understand
phenotypes?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have
absolutely no
idea what that
means
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
Marcin Wichary [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
12. A disease is a collection of
phenotypes
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Flat back of head Hypotonia
Abnormal skull morphology Decreased muscle mass
13. Do we *really* need yet another clinical
vocabulary?
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
UMLS
SNOMED CT
CHV
MedDRA
MeSH
NCIT
ICD10-C
ICD9-CM
ICD-10
OMIM
MedlinePlus
Existing clinical vocabularies don’t adequately cover phenotype descriptions
14. Disease-phenotype associations using an
ontology
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
15. Once OMIM is rendered
computable, are we done yet?
Free text -> HPO
enables phenotype semantic
similarity matching
16. Mendelian disease integration
Merges sources together using:
equivalence and subclass axioms derived from xrefs
string matching
manual efforts to fill gaps based on phenotypes and anatomical
axioms
Parkinson’s disease
subtypes
Different colors =
different disease
sources
https://github.com/monarch-initiative/monarch-disease-ontology
17. Why we need all the organisms
Model data can provide up to 80% phenotypic coverage
of the human coding genome
20. Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
21. Challenge: Each database uses
their own vocabulary/ontology
MP
HP
MGI
HPOA
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
22. Challenge: Each database uses
their own vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNO
MED
…
…
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
IMPC
OMIM
ICD
QTLdb
EHR
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html
23. Decomposition of complex
concepts allows interoperability
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner, M.
(2010). Integrating phenotype ontologies across multiple species. Genome
Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneum
layer of skin
=
Human phenotype
PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO
26. Graph Views
Diverse
G2P/D
source data
Source
Ontologies Owl Loader
Graph
Views
Monarch App
Faceted
Browsing
Phenotype
Matching
.ttl
.ttl
Input OutputPipeline
Putting it Together:
Data + Ontologies
https://github.com/SciGraph/SciGraph
27. Data Integrated in SciGraph
>25 sources
>100 species
51M triples
4M curated
associations
2.2M G-P / G-D
associations
28. Genotype-phenotype integration
One source
Two sources
3 or more
9%
91% of our 2.2 Million G2P associations required
integrating 2 or more data sources
(this number does not even include orthology (Panther))
91%
30. Combining genotype and phenotype
data for variant prioritization
Whole exome
Remove off-target and
common variants
Variant score from allele
freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters
https://www.sanger.ac.uk/reso
urces/software/exomiser/
31. York platelet syndrome and STIM1
Markello T et al. Molecular Genetics and Metabolism 2015, 114: 474 Grosse J, J Clin Invest 2007 117: 3540-50
Impaired platelet aggregation
(HP:0003540)
Thromocytopenia (HP:0001873)
Abnormal platelet activation
(MP:0006298)
Thrombocytopenia (MP:0003179)
UDP_2542 Stim1Sax/Sax
http://www.nature.com/gim/journal/vaop/ncurrent/full/gim2015137a.html
34. Credit extends beyond the
publication
Johannes creates stim1 mouse
Melissa annotates patient UDP_2542 with HPO
Will performs analysis of UDP_2542 that includes
stim1 mouse to generate a dataset of
prioritized variants
Tom writes publication pmid:25577287 about the
STIM1 diagnosis
Tom explicitly credits Will as an author but not
Melissa.
40. Who is in the graph?
Melissa Haendel
Peter Robinson
Chris Mungall
Sebastian Kohler
Cindy Smith
Nicole Vasilevsky
Sandra Dolken
Johannes Grosse
Attila Braun
David Varga-Szabo
Niklas Beyersdorf
Boris Schneider
Lutz Zeitlmann
Petra Hanke
Patricia Schropp
Silke Mühlstedt
Carolin Zorn
Michael Huber
Carolin Schmittwolf
Wolfgang Jagla
Philipp Yu
Thomas Kerkau
Harald Schulze
Michael Nehls
Bernhard Nieswandt
Thomas Markello
Dong Chen
Justin Y. Kwan
Iren Horkayne-Szakaly
Alan Morrison
Olga Simakova
Irina Maric
Jay Lozier
Andrew R. Cullinane
Tatjana Kilo
Lynn Meister
Kourosh Pakzad
Sanjay Chainani
Roxanne Fischer
Camilo Toro
James G. White
David Adams
Cornelius Boerkoel
William A. Gahl
Cynthia J. Tifft
Meral Gunay-Aygun
Melissa Haendel
David Adams
David Draper
Bailey Gallinger
Joie Davis
Nicole Vasilevsky
Heather Trang
Rena Godfrey
Gretchen Golas
Catherine Groden
Michele Nehrebecky
Ariane Soldatos
Elise Valkanas,
Colleen Wahl
Lynne Wolfe
Elizabeth Lee
Amanda Links
Will Bone
Murat Sincan
Damian Smedley
Jules Jacobson
Nicole Washington
Elise Flynn
Sebastian Kohler
Orion Buske
Marta Girdea
Michael Brudno
Jeremy Band
Hans Goeble
Karen Balbach
Nadine Pfeifer
Sandra Werner
Christian Linden
Clinical/care Pathology Ontologist CS/informatics Curator Basic research
41. Tracking Evidence and Provenance
of G2P Associations
Evidence is a collection of information that is used
to support a scientific claim or association
Provenance is a history of what processes led to
the claim being made, what entities participated in
these processes
Value of Evidence and Provenance Metadata
context to evaluate credibility/confidence
support filtering and analysis of data
detailed history for attribution
44. What about patients?
Can they help too?
HP:0000252
Pref Label: Microcephaly
Synonyms: Decreased Head Circumference;
Reduced Head Circumference; Small head
circumference
Suggested Synonyms : Small Head; Little Head;
Small Skull; Little Skull; Small Cranium…
Small headMicrocephaly
https://commons.wikimedia.org/wiki/File:Microcephaly.png#/media/File:Microcephaly.png
45. Job opening
https://goo.gl/MlcnR5
Focusing on building ontologies and
semantic web technologies to
represent research, attribution,
provenance, and scholarly
communication
@ontowonka haendel@ohsu.edu
46. Funding: NIH Office of Director: 1R24OD011883; NIH-UDP:
HHSN268201300036C, HHSN268201400093P; NCINCI/Leidos #15X143,
BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)
PIs: Chris Mungall, Peter Robinson, Damian Smedley, Tudor Groza, Harry Hochheiser
www.monarchinitiative.org/page/team
Notas del editor
Not sure about origin of this image
We understand central hypothesis DNA RNA Protein building blocks
We’ve found reliable methods to describe and move genetic information around with computers.
that we can see/ assess phenotype,
But how do you computationally describe it ?
Massive amounts of genetic data must also be able to be aligned with a phenotype – in a way that a machine can reason and infer
an undiagnosed genetic patient having several phenotypes (asymmetry of face, temporal bulging, café au lait on neck, asymmetric smile/ facial animation, uneven eyes.
The standard genomic paradigm. Based on statistical properties like distributions of variations in the genome in humans
There is a lot we don’t know about the genome
Adding phenotype
Sorry Star Trek, you had to go for posting.
Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
Can get rid of this slide?
Represent organism as a biological subject
Represent diseases/genotypes as collections of nodes in the graph
3. Interoperable with other bioinformatics resources and leverage modern semantic standards
Highlighting how we get different phenotypic information from different sources, species
Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype ontology
https://code.google.com/p/phenotype-ontologies/
The distribution of phenotype information per model genotype is different compared to human disease annotations.
For mouse, there’s a much higher representation of metabolic, cardiovascular, blood, and endocrine phenotypes available to compare;
For fish, there’s increased nervous, skeletal, head and neck, and cardiovascular, and connective tissue.
(Note that these do not include “normal” phenotypes for either diseases or genotypes.)
What does it mean to replicate a phenotypic profile in a model organism? For many patients or diseases, we may need different models to fully recapitulate the disease. Further, some phenotypes are common in a given species and if present in the patient, would be a less significant result.
Multiple databases, each with their own vocabulary; these images are of questionable licensing and origin
We make things digestible. Complex concepts into simpler parts. We use ontologies that are comparative by design.
We can match in “fuzzy” ways by making semantic associations, and leveraging underlying logic, such as anatomy
These images are not licensed and I don’t even know where they came from
This was the novel case we solved. The UDP patient had a number of signs and symptoms including various platelet abnormalities. The same heterozygous, missense mutation was seen in 2 patients and ranked top by Exomiser. It had never been seen in any of the SNP databases and was predicted maximally pathogenic. Finally a mouse curated by MGI involving a heterozygous, missense point mutation introduced by chemical mutagenesis exhibited strikingly similar platelet abnormalities.
This image is public domain https://pixabay.com/en/detective-male-man-profile-156465/
Not sure what the license is on this thing..
Have requested the rights for originally presented picture.
Here is a similar one with the following attribution:
"Microcephaly" by Unknown - (2004) Evolutionary History of a Gene Controlling Brain Size. PLoS Biol 2(5): e134. doi:10.1371/journal.pbio.0020134. Licensed under CC BY 2.5 via Commons - https://commons.wikimedia.org/wiki/File:Microcephaly.png#/media/File:Microcephaly.png