What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data
1. An Ontological Characterization for the
Integration of Genetic Variation Data
WHAT’S IN A GENOTYPE?
Matthew H. Brush,
Chris Mungall, Nicole Washington, and Melissa Haendel
Oregon Health and Science University, Lawrence Berkeley Labs
International Conference in Biomedical Ontology
July 8, 2013
2. Genotype-to-Phenotype Research
B6.Cg-Alms1foz/fox/J
increased weight,
adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)
[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,
diabetes mellitus,
insulin resistance
increased food
intake, hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
G2P research seeks a mechanistic understanding of how genetic
variation is linked to organismal biology and disease
4. Integrating G2P Data
The Monarch Initiative
The Monarch Initiative aims to bring G2P and related data
together under a common semantic framework to support
integrated exploration and analysis.
5. Integration Challenges
I. Reconciling G2P data annotated to different
‘levels’ of a genotype
II. Integrating ‘non-genomic’ forms of variation
III. Creating semantic links to biological data
Technical Challenges
Terminological, syntactic, organizational
variation in data is common
Knowledge-Based Challenges
Reflect inherent complexity in the way G2P data is
generated and what it represents
6. GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
Decomposition of a Genotype
genotype
genomic variation
complementgenomic background
= +
CGTAGC
CGTACC
apchu745/+; fgf8ati282/ti282(AB)
genomic variation
complement
variant single locus
complement
variant locus
(allele)
sequence alteration
has_part has_part
apchu745/+
apchu745
hu745
has_part has_part
has_part has_part
X
AACGTACCGACGCTCGCTACGGGCGTATC
(AB) apchu745/+; fgf8ati282/ti282
apchu745/+; fgf8ati282/ti282
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
AACGTAGCGACGCTCGCTACGGGCGTATC
AACGTACCGACGCTCGCTACGGGCGTATC X
ACAC
X
X
X
X
Genotype – an information entity that specifies an entire genome sequence
in terms of its variation from some reference genome
AACGTAGCGACGCTCGCTACGGGCGTATC
X ACAC
X
X
X
X
X
7. I. Reconciling Levels of G2P Association
apchu745/+; fgf8ati282/ti282(AB)
increased cell proliferation
disrupted digestive tract development
gut deformation
APC (NM_000038.5)
c.937_938delGA
X
Phenotype AllelePhenotype Genome
CGTACCG
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
X
AACGTACCGACGCTCGCTACGGGCGTATC
AACGTAGCGACGCTCGCTACGGGCGTATC
X
X
intestinal polyps
abnormal retinal pigmentation
sebaceous cysts
8. allele: apchu745
gene: apc fgf8a
allele: c.937_938delGA
gene: apc
(PHENOTYPE
PROPAGATION)
I. Reconciling Levels of G2P Association
inferred
apchu745/+; fgf8ati282/ti282(AB)
increased cell proliferation
disrupted digestive tract development
gut deformation
APC (NM_000038.5)
c.937_938delGA
X
Phenotype Genome
CGTACCG
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
X
AACGTACCGACGCTCGCTACGGGCGTATC
AACGTAGCGACGCTCGCTACGGGCGTATC
X
X
intestinal polyps
abnormal retinal pigmentation
sebaceous cysts
Phenotype Allele
9. Property chains exploit the transitive genotype
partonomy to infer phenotype associations
[variant] is_variant_part_of genotype
genotype has_phenotype phenotype
Atomic Relations
Composed Relation
is_variant_part_of o has_phenotype -->
is_variant_with_phenotype
Implementation of Phenotype Propagation
10. Example of Phenotype Propagation
has_phenotype
apchu745/+;fgf8ati282/ti282(AB)
cell proliferation,
digestive tract development
gut deformation
1. Monarch ingests
phenotypes annotated
to a genotype
genotype
11. Example of Phenotype Propagation
apchu745,
fgf8ati282
hu745
ti282
has_variant_part
has_variant_part
has_variant_part
has_variant_part
apchu745/+;fgf8ati282/ti282(AB)
apchu745/+;fgf8ati282/ti282
apchu745/+ ,
fgf8ati282/ti282
cell proliferation,
digestive tract development
gut deformation
apc fgf8a
1. Monarch ingests
phenotypes annotated
to a genotype
2. Genotype is parsed to
create instances down
partonomy Alleles
GVC
VSLCs
Seq.
Alts
Genes
has_phenotype
12. Example of Phenotype Propagation
1. Monarch ingests
phenotypes annotated
to a genotype
2. Genotype is parsed to
create instances down
partonomy
3. Phenotype propagation
infers associations
between phenotypes
and each level in the
partonomy
apchu745,
fgf8ati282
hu745
ti282
apc fgf8a
has_variant_part
has_variant_part
has_variant_part
has_variant_part
apchu745/+;fgf8ati282/ti282(AB)
apchu745/+;fgf8ati282/ti282
apchu745/+ ,
fgf8ati282/ti282
cell proliferation,
digestive tract development
gut deformation
Alleles
GVC
VSLCs
Seq.
Alts
Genes
has_phenotype
is_variant_
with_
phenotype
13. II. Integrating Non-Genomic Variation
‘Extrinsic genotypes’ describe
sequences subject to transient
variations in expression at the
time of an experiment
Representing extrinsic
variation data in terms of the
targeted genes facilitates
integration with ‘intrinsic’
G2P data
Morpholino-mediated
gene knockdown
;
15. GENO In the OBO Foundry
• GENO modeled according to OBO Foundry principles, under
conceptual frameworks of the BFO, IAO, and SO
• Collaborators in SO refactoring to enhance genetic variation
representation, and ensure integration of Monarch data with
SO-annotated genomes
16. Summary and Future Directions
GENO in the Monarch Data Integration Pipeline
1. Raw data ingested into Monarch RDB
2. Views generated that contain “GENO-enhanced” data
(standardized syntax, unpacked genotypes, links to external data)
3. D2RQ maps relational data to GENO and generates RDF
4. GENO-supported reasoning adds inferred G2P associations
(e.g. phenotype propagation)
Future Directions
1. Modeling of transgenes, human variation, and related data types
2. Develop property chains and algorithms to improve specificity and
weighting of inferred G2P associations
3. Separate application features to provide a community model for
public release and integration with SO
17. Acknowledgements
OHSU
Melissa Haendel
Carlo Torniai
Shahim Essaid
Nicole Vasilevsky
Scott Hoffman
LBNL
Chris Mungall
Suzi Lewis
Nicole Washington
UCSD/NIF
Maryann Martone
Anita Bandrowski
Jeff Grethe
Amarnath Gupta
Trish Whetzel
University of Pittsburgh
Harry Hochheiser
Chuck Borromeo
Monarch Initiative / NIF
Sequence Ontology
University of Utah
Karen Eilbeck
University of Colorado
Mike Bada
Funding
NIH # 1R24OD011883-01
We are under construction
OHSU Ontology
Development Group
www.ohsu.edu/library/ontology
GENO ontology
purl.obolibrary.org/obo/geno.owl