4. …the relationships and their evidence must also be
captured
G-P or D (disease)
• causes
• contributes to
• is risk factor for
• protects against
• correlates with
• is marker for
• modulates
• involved in
• increases susceptibility to
G-G (kind of)
• regulates
• negatively regulates (inhibits)
• positively regulates (activates)
• directly regulates
• interacts with
• co-localizes with
• co-expressed with
P/D - P/D
• part of
• results in
• co-occurs with
• correlates with
• hallmark of (P->D)
E-P
• contributes to (E->P)
• influences (E->P)
• exacerbates (E->P)
• manifest in (P->E)
G-E (kind of)
• expressed in
• expressed during
• contains
• inactivated by
5. The Human Phenotype Ontology
11,813
phenotype
terms
127,125 rare
disease -
phenotype
annotations
136,268
common
disease -
phenotype
annotations
bit.ly/hpo-paper
Peter Robinson, Sebastian Koehler, Chris Mungall
6. Other clinical vocabularies don’t adequately
cover phenotypic descriptions
Winnenburg and Bodenreider, 2014
Percentcoverage
=> HPO is now in the UMLS
0
20
40
60
80
100
HPO
UMLS
SNOMEDCT
CHV
MedDRA
MeSH
NCIT
ICD10
OMIM
8. How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)
Male (4)
Blue skin (1)
Pointy ears (1)
Hair absent on head (1)
Horns present (1)
Hair present
on head (7)
Enlarged lip (2)
Increased skin
pigmentation (3)
bit.ly/annotationsufficiency
9. Matchmaker Exchange for patients, diseases, and model
organisms
Computational matching of rare disease patients across clinical & public sources
Find models and experts for functional validation
bit.ly/mme-matchbox
patientarchive.org
bit.ly/exomiser-2017
14. Sequence
Ontology
Uberon
Anatomy
Ontology
Genotype
Ontology
MONDO
Disease
Ontology
Human
Phenotype
Ontology NCBIGene
Reactome
NCBITaxon
Protein
Ontology
ChEBI chemical
entities ontology
UNII chemical
substance registry
Cell
Ontology
Cell
Ontology
Ontology of
Biomedical
Investigations
Gene
Ontology
(GO-BP)
Uberon
Anatomy
Ontology
Gene
Ontology
(GO-CC)
UniProt
Lobular Breast Carcinoma =
'Breast Adenocarcinoma'
and (Disease_Has_Normal_Tissue_Origin some 'Terminal Ductal Lobular Unit')
and (Disease_Has_Normal_Cell_Origin some 'Terminal Ductal Lobular Unit Cell')
and (Disease_Has_Abnormal_Cell some 'Lobular Carcinoma Cell')
and (Disease_May_Have_Cytogenetic_Abnormality some 'Loss of Chromosome 16q')
and (Disease_Excludes_Abnormal_Cell some 'Ductal Carcinoma Cell')
and (Disease_Excludes_Finding some 'Mixed Cellular Population')
and (Disease_Mapped_To_Gene some 'CDH1 Gene')
and (Disease_May_Have_Molecular_Abnormality some 'Loss of E-cadherin Expression')
and (Disease_May_Have_Molecular_Abnormality some 'CDH1 Gene Inactivation')
Tailoring the NCIt for computational
interoperability
Jim Balhoff, Sherri DeCorronado, Giberto Fragoso, Nicole Vasilevsky,
Paula Carrio Caro, Matt Brush, Chris Mungall
15. Variant Pathogenicity Interpretations
Pathogenic ?
Benign ?
"DSC2:c.631-2A>G
Right
Ventricular
Cardiomyopathy
Complications to variant interpretation:
Pathogenicity evidence is complex, diverse, indirect, conflicting
Siloed curation guidelines
High stakes (Applied directly to care)
16. Improving Rigor and Consistency of
Variant Interpretation
2015 ACMG-AMP Variant Interpretation Guidelines
28 ‘criteria’ re: evidence types, strength
Framework for combining criteria outcomes
ClinGen Variant Curation Interface (VCI) and DMWG
Data model and curation for variant evidence and provenance
SEPIO Scientific Evidence and Provenance Information Ontology
Computable model for representation and analysis of evidence and provenance
Merged Disease Classification
• Harmonized disease classification for algorithmic use and pathogenicity assignment
SEPIOScientific Evidence and
Provenance Information
Matt Brush, Selina Dwight, Larry Babb, Chris Bizon, Bradford Powell, Tristan Nelson, Bob Freimuth, Chris Mungall
17. co-localization evidence
functional
complementation evidence
microscopy evidence
imaging evidence
co-immunoprecipitation
evidence
:e4
Algorithms can leverage semantics of SEPIO models to compute
quantitative metrics of evidence quality, quantity, diversity, and
concordance – supporting automated evaluation of claims.
:e5:e3:e1 :e2
:claim1
“pathogenic”
:claim2
“benign”
Evidence-Based Computational
Evaluation of Claims
https://github.com/monarch-initiative/SEPIO-ontology/wiki
18. Disease 1 Disease 2
Data Standards Ontologies Data Standards Ontologies Data Standards Ontologies
Genes Environment Phenotypes
How do all these ontologies fit into our
notion of disease?
FHIR
19. Disease 1 Disease 2
Data Standards Ontologies Data Standards Ontologies Data Standards Ontologies
Genes Environment Phenotypes
FHIR
METADATA, EVIDENCE
20. Defining disease and clinical pathogenicity:
A lumping and splitting problem
source IDs
split/merge
manage
resolution &
provenance
MONDO Unified
Disease OntologySEPIOScientific Evidence and
Provenance Information
One disease or two?
What does the evidence favor?
One disease or two?
How do we manage identifiers, hierarchy?
23. Genes Environment Phenotypes
VCF PXFGFF
Standard exchange formats exist for genes …
but for phenotypes? Environment?
BED
http://phenopackets.org New Funding: Forums for Phenomics!
26. What’s next? Challenges for this
workstream
Figure out how ontologies, metadata, eHealth and exchange
standards all fit together in this workstream
Further harmonize existing disease and phenotype ontologies and
standards
Define exchange of structured phenotype data in different contexts:
clinical, basic research, patients, databases, journals
Getting structured G2P data–that is about the biology of the patient -
into/out of the EHR
Demonstrate standardization success across the driver projects
Discuss!
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
The classic G+E=P. But the = has a lot that can be applied to aid the linking.
not same variant, but same disease and same gene KMT2A
http://stm.sciencemag.org/content/scitransmed/suppl/2014/08/29/6.252.252ra123.DC1/6-252ra123_SM.pdf (paywalled) DOI: 10.1126/scitranslmed.3009262
Knowing what the normal distribution and clustering of phenotypes is helps us know that blue skin is rare and can reliably distinguish between phenotype profiles. Likewise to know that if the first phenotype entered is enlarged lip, the next one to ask for would be enlarged ears. The combination of 3 non-unique phenotypes offers a perfect match.
FDA as well as PMDA (Japan) requires use of CDISC standards for all clinical trial submissions - human and animal toxicology. The SDTM standard (for human clinical trials) includes over 30,000 controlled terms coded in NCI Thesaurus.
- variant pathogenicity classifications rely on nuanced interpretation of complex and diverse evidence.
- this is a domain where capturing and computing on E/P metadata is essential for important applications in research and healthcare.
1. ACMG Guidelines: consistent interpretation and application of evidence
set of 28 criteria defining relevant types of evidence and how to evaluate their strength
a particular variant is evaluated against all criteria relevant to what is known about the variant
- guidelines then provide a framework for combining outcomes of these 'criterion assessments' to derive a final classification into one of five categories
- goal is more principled and consistent interpretation -> more reliable with fewer conflicts
2. ClinGen VCI: curation and exchange of evidence and provenance information collected in ACMG-gudied workflows
CG developing VCI that implements the ACMG workflow – and capturing structured representations of rich/granular provenance and evidence metadata for each step in workflow
3. SEPIO: computable model for representing evidence and provenance information
ClinGen is using SEPIO model to create extensible, integrated, computable data structures for data exchange and analysis
------------
ClinGen is using SEPIO ontology model to enable extensible, interoperable, and computable E/P metadata
--
We can apply semantic similarity algorithms that use the graph-distance between classes, to estimate the similarity of the evidence lines these classes annotate. The idea here is that more diverse lines of evidence provide stronger support for a claim than closely related ones.
https://github.com/monarch-initiative/monarch-disease-ontology/issues/90
Note the two subgraphs; little overlap in the upper areas
The classic G+E=P. But the = has a lot that can be applied to aid the linking.