Presented at AMIA TBI 2016 BD2K Panel. A description of the Monarch Initiative's efforts to perform deep phenotyping data integration across species, facilitate exchange, and build computable G2P evidence modesl to aid variant interpretation.
5. Challenge: Each database uses their own
phenotype vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNO
MED
…
…
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPC
OMIM
…
QTLdb
6. monarchinitiative.org
Can we help machines understand
phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have
absolutely no
idea what that
means
7. The Human Phenotype Ontology
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
12. What’s in a Phenopacket?
Ontology-based phenotypic descriptions for:
Human patients, model organisms, or any organism
Groupings of human patients or organisms
What does it include?
age of patient or organism
sex of patient or organism
disease (if named)
age of onset of disease
Positive and negative phenotype associations
Reference to Genes, variants, or collections of variants
Reference to environmental factors
Multiple formats: TSV, JSON, YAML, JSON
Validation tools
Uses standardized publication citation mechanism for data sharing
14. Problems with evidence and provenance
of G2P Associations
PROBLEMS:
Variants have different pathogenicity calls due to annotation
inconsistency AND different experimental evidence
Incomplete, not computable, and frequently conflated
Annotations are to different aspects of the genotype: allele, variant,
gene, transcript, etc.
A computable model would enable:
context to evaluate credibility/confidence
support filtering and analysis of data
detailed history for attribution
15. Building a computable model for ACMG
guidelines
http://brcaexchange.org/
Provenance Evidence Claim
- Materials & methods
- Agent(s) of evidence
- Agent(s) of claim
- Time and place
- Data (eg: images, sequences)
- Evidence codes
- Publications
- Confidence (p-val, z-score)
- Summary figures
- Conclusions from previous studies
- Domain expert’s knowledge
Causal relationships,
hypothesized relationships,
correlations etc.
https://github.com/monarch-initiative/SEPIO-ontology
16. Summary
Ontologies can be used to perform deep phenotyping
integration across species
An exchange standard is needed to facilitate distributed
phenotype data sharing
A computable G2P evidence model can aid variant
interpretation
17. Acknowledgements
Lawrence Berkeley
Chris Mungall
Nicole Washington
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Charité
Peter Robinson
Sebastian Kohler
U of Pittsburgh
Harry Hochheiser
Mike Davis
Joe Zhou
OHSU
Nicole Vasilesky
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Genomics England
Damian Smedley
Jules Jacobson
UCSC
David Haussler
Benedict Paten
Mark Diekhans
Melissa Cline
Garvan
Tudor Groza
Craig McNamara
Edwin Zhang
FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP:
HHSN268201300036C, HHSN268201400093P;
NCINCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler)
& BD2K PA-15-144-U01 (Kesselman)
Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
Represent organism as a biological subject
Represent diseases/genotypes as collections of nodes in the graph
3. Interoperable with other bioinformatics resources and leverage modern semantic standards
If we include bridging ontologies, we can unify diseases across sources AND phenotypes across sources and organisms.
To support downstream hypothesis testing and evaluation, “trust”, we need a computable model for evidence.
There are a lot of people who have contributed to this work over many years.