SlideShare una empresa de Scribd logo
1 de 17
On the frontier of genotype-2-phenotype
data integration
Melissa Haendel, PhD
March 22, 2016
AMIA TBI
@monarchinit @ontowonka
haendel@ohsu.edu
Filling the G2P knowledge gap from other
organisms
Other= rat, fly, worm, mouse, zebrafish
monarchinitiative.org
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Challenge: Each database uses their own
vocabulary/ontology
MP
HP
MGI
HPOA
Challenge: Each database uses their own
phenotype vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNO
MED
…
…
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPC
OMIM
…
QTLdb
monarchinitiative.org
Can we help machines understand
phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have
absolutely no
idea what that
means
The Human Phenotype Ontology
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
monarchinitiative.org
Genotype-phenotype integration
One source
Two sources
3 or more
9%
91% of our 2.2 Million G2P associations required
integrating 2 or more data sources
(this number does not even include orthology (Panther) or any ontologies!)
91%
Diagnosing an undiagnosed disease
www.owlsim.org
Phenotype Exchange Standard
Mechanistic discovery
Improved
searchability
Integrated Data Landscape
Tool/algorithm creation
Cohort
identification
Patient
registries
Databases,
Web tools,
AlgorithmsPhenopacket
Registry
JournalsDiagnostic
screening
programs Clinical
trials
Phenopacket
flow
Primarybenefits
tostakeholders
Patients/
Families
Physicians
Patient
matchmaking
Diagnosis speed/accuracy
Organismal
biologist
www.phenopackets.org
What’s in a Phenopacket?
Ontology-based phenotypic descriptions for:
 Human patients, model organisms, or any organism
 Groupings of human patients or organisms
What does it include?
 age of patient or organism
 sex of patient or organism
 disease (if named)
 age of onset of disease
 Positive and negative phenotype associations
 Reference to Genes, variants, or collections of variants
 Reference to environmental factors
Multiple formats: TSV, JSON, YAML, JSON
Validation tools
Uses standardized publication citation mechanism for data sharing
brca-website.cloudapp.net
 13501 variants from ENIGMA, ClinVar, LOVD, exLOVD, BIC
 Merged by genomic coordinate and alternate allele string
Problems with evidence and provenance
of G2P Associations
PROBLEMS:
Variants have different pathogenicity calls due to annotation
inconsistency AND different experimental evidence
Incomplete, not computable, and frequently conflated
Annotations are to different aspects of the genotype: allele, variant,
gene, transcript, etc.
A computable model would enable:
 context to evaluate credibility/confidence
 support filtering and analysis of data
 detailed history for attribution
Building a computable model for ACMG
guidelines
http://brcaexchange.org/
Provenance Evidence Claim
- Materials & methods
- Agent(s) of evidence
- Agent(s) of claim
- Time and place
- Data (eg: images, sequences)
- Evidence codes
- Publications
- Confidence (p-val, z-score)
- Summary figures
- Conclusions from previous studies
- Domain expert’s knowledge
Causal relationships,
hypothesized relationships,
correlations etc.
https://github.com/monarch-initiative/SEPIO-ontology
Summary
 Ontologies can be used to perform deep phenotyping
integration across species
 An exchange standard is needed to facilitate distributed
phenotype data sharing
 A computable G2P evidence model can aid variant
interpretation
Acknowledgements
Lawrence Berkeley
Chris Mungall
Nicole Washington
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Charité
Peter Robinson
Sebastian Kohler
U of Pittsburgh
Harry Hochheiser
Mike Davis
Joe Zhou
OHSU
Nicole Vasilesky
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Genomics England
Damian Smedley
Jules Jacobson
UCSC
David Haussler
Benedict Paten
Mark Diekhans
Melissa Cline
Garvan
Tudor Groza
Craig McNamara
Edwin Zhang
FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP:
HHSN268201300036C, HHSN268201400093P;
NCINCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler)
& BD2K PA-15-144-U01 (Kesselman)

Más contenido relacionado

La actualidad más candente

Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseasemhaendel
 
Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery mhaendel
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoverymhaendel
 
Integrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoveryIntegrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoverymhaendel
 
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updatemhaendel
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoverymhaendel
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introductionmhaendel
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverymhaendel
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...mhb120
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglyJoão André Carriço
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...João André Carriço
 
Common languages in genomic epidemiology: from ontologies to algorithms
Common languages in genomic epidemiology: from ontologies to algorithmsCommon languages in genomic epidemiology: from ontologies to algorithms
Common languages in genomic epidemiology: from ontologies to algorithmsJoão André Carriço
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR ProfilingCreative-Bioarray
 
Resazurin Cell Viability Assay
Resazurin Cell Viability AssayResazurin Cell Viability Assay
Resazurin Cell Viability Assaycreativebioarray22
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Nicole Vasilevsky
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsGolden Helix Inc
 

La actualidad más candente (20)

Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve disease
 
Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery Semantic phenotyping for disease diagnosis and discovery
Semantic phenotyping for disease diagnosis and discovery
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
 
Integrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discoveryIntegrating clinical and model organism G2P data for disease discovery
Integrating clinical and model organism G2P data for disease discovery
 
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
 
Mikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_systemMikel egana itbam_2010_ogo_system
Mikel egana itbam_2010_ogo_system
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...Genomic Epidemiology:  How High Throughput Sequencing changed our view on bac...
Genomic Epidemiology: How High Throughput Sequencing changed our view on bac...
 
Common languages in genomic epidemiology: from ontologies to algorithms
Common languages in genomic epidemiology: from ontologies to algorithmsCommon languages in genomic epidemiology: from ontologies to algorithms
Common languages in genomic epidemiology: from ontologies to algorithms
 
Cell Authentication By STR Profiling
Cell Authentication By STR ProfilingCell Authentication By STR Profiling
Cell Authentication By STR Profiling
 
Resazurin Cell Viability Assay
Resazurin Cell Viability AssayResazurin Cell Viability Assay
Resazurin Cell Viability Assay
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015
 
Using Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS VariantsUsing Public Access Clinical Databases to Interpret NGS Variants
Using Public Access Clinical Databases to Interpret NGS Variants
 

Destacado (17)

書籍市場の現状
書籍市場の現状書籍市場の現状
書籍市場の現状
 
Talk nerdy to me!
Talk nerdy to me!Talk nerdy to me!
Talk nerdy to me!
 
Code iscool
Code iscoolCode iscool
Code iscool
 
World Economic Forum on Africa 2006
World Economic Forum on Africa 2006World Economic Forum on Africa 2006
World Economic Forum on Africa 2006
 
日本語テスト 空白あり
日本語テスト 空白あり日本語テスト 空白あり
日本語テスト 空白あり
 
Defining classes-and-objects-1.0
Defining classes-and-objects-1.0Defining classes-and-objects-1.0
Defining classes-and-objects-1.0
 
Estudiante virtual exioso
Estudiante virtual exiosoEstudiante virtual exioso
Estudiante virtual exioso
 
Leveraging Social Media Tools
Leveraging Social Media ToolsLeveraging Social Media Tools
Leveraging Social Media Tools
 
ChefConf2014 - Chef TDD
ChefConf2014 - Chef TDD ChefConf2014 - Chef TDD
ChefConf2014 - Chef TDD
 
Teatro s. XVII
Teatro s. XVIITeatro s. XVII
Teatro s. XVII
 
Methods intro-1.0
Methods intro-1.0Methods intro-1.0
Methods intro-1.0
 
Basic computer
Basic computerBasic computer
Basic computer
 
Grafico diario del dax perfomance index para el 07 11-2012
Grafico diario del dax perfomance index para el 07 11-2012Grafico diario del dax perfomance index para el 07 11-2012
Grafico diario del dax perfomance index para el 07 11-2012
 
Adobe Flash, entre el amor y el odio
Adobe Flash, entre el amor y el odioAdobe Flash, entre el amor y el odio
Adobe Flash, entre el amor y el odio
 
Καινοτομια
ΚαινοτομιαΚαινοτομια
Καινοτομια
 
حملة عمر بلدك
حملة عمر بلدكحملة عمر بلدك
حملة عمر بلدك
 
There's No Crying In Local Search
There's No Crying In Local SearchThere's No Crying In Local Search
There's No Crying In Local Search
 

Similar a On the frontier of genotype-2-phenotype data integration

Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsTim Clark
 
The Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype GridThe Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype GridHarry Hochheiser
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experimentsHelena Deus
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14mhaendel
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression DatabaseКолкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Databasebigdatabm
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataChris Mungall
 
Phenotype rcn so-geno_workshop(shared)
Phenotype rcn so-geno_workshop(shared)Phenotype rcn so-geno_workshop(shared)
Phenotype rcn so-geno_workshop(shared)mhb120
 
Festival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talkFestival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talkJean Fan
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Chris Mungall
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherNils Gehlenborg
 
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)avalgar
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014Ek_Kul
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsfisherp
 
Towards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systemsTowards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systemscursoNGS
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsAmit Sheth
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Microhaplotype, A Powerful New Type of Genetic Marker
Microhaplotype, A Powerful New Type of Genetic MarkerMicrohaplotype, A Powerful New Type of Genetic Marker
Microhaplotype, A Powerful New Type of Genetic MarkerMojgan Talebian
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesLeighton Pritchard
 

Similar a On the frontier of genotype-2-phenotype data integration (20)

Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
The Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype GridThe Monarch Initiative Phenotype Grid
The Monarch Initiative Phenotype Grid
 
provenance of microarray experiments
provenance of microarray experimentsprovenance of microarray experiments
provenance of microarray experiments
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14Haendel clingenetics.3.14.14
Haendel clingenetics.3.14.14
 
Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression DatabaseКолкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
Колкер Е. An introduction to MOPED: Multi-Omics Profiling Expression Database
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
Phenotype rcn so-geno_workshop(shared)
Phenotype rcn so-geno_workshop(shared)Phenotype rcn so-geno_workshop(shared)
Phenotype rcn so-geno_workshop(shared)
 
Festival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talkFestival Of Genomics 2016 - Brain talk
Festival Of Genomics 2016 - Brain talk
 
Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017Mungall keynote-biocurator-2017
Mungall keynote-biocurator-2017
 
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
 
Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)Research report (alternative splicing, protein structure; retinitis pigmentosa)
Research report (alternative splicing, protein structure; retinitis pigmentosa)
 
Kulakova sbb2014
Kulakova sbb2014Kulakova sbb2014
Kulakova sbb2014
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlations
 
Towards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systemsTowards an understanding of diversity in biological and biomedical systems
Towards an understanding of diversity in biological and biomedical systems
 
Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Microhaplotype, A Powerful New Type of Genetic Marker
Microhaplotype, A Powerful New Type of Genetic MarkerMicrohaplotype, A Powerful New Type of Genetic Marker
Microhaplotype, A Powerful New Type of Genetic Marker
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 

Más de mhaendel

The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholdermhaendel
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odysseymhaendel
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebasesmhaendel
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?mhaendel
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsmhaendel
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we domhaendel
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapemhaendel
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardmhaendel
 
On the nature of Credit
On the nature of CreditOn the nature of Credit
On the nature of Creditmhaendel
 
Standardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologyStandardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologymhaendel
 

Más de mhaendel (13)

The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we do
 
Force11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscapeForce11: Enabling transparency and efficiency in the research landscape
Force11: Enabling transparency and efficiency in the research landscape
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 
On the nature of Credit
On the nature of CreditOn the nature of Credit
On the nature of Credit
 
Standardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontologyStandardizing scholarly output with the VIVO ontology
Standardizing scholarly output with the VIVO ontology
 

Último

GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 

Último (20)

GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 

On the frontier of genotype-2-phenotype data integration

  • 1. On the frontier of genotype-2-phenotype data integration Melissa Haendel, PhD March 22, 2016 AMIA TBI @monarchinit @ontowonka haendel@ohsu.edu
  • 2. Filling the G2P knowledge gap from other organisms Other= rat, fly, worm, mouse, zebrafish
  • 4. Challenge: Each database uses their own vocabulary/ontology MP HP MGI HPOA
  • 5. Challenge: Each database uses their own phenotype vocabulary/ontology ZFA MP DPO WPO HP OMIA VT FYPO APO SNO MED … … … WB PB FB OMIA MGI RGD ZFIN SGD HPOA EHR IMPC OMIM … QTLdb
  • 6. monarchinitiative.org Can we help machines understand phenotype terms? “Palmoplantar hyperkeratosis” Human phenotype I have absolutely no idea what that means
  • 7. The Human Phenotype Ontology Hyposmia Abnormality of globe location eyeball of camera-type eye sensory perception of smell Abnormal eye morphology Motor neuron atrophyDeeply set eyes motor neuronCL 34571 annotations in 22 species 157534 phenotype annotations 2150 phenotype annotations
  • 8.
  • 9. monarchinitiative.org Genotype-phenotype integration One source Two sources 3 or more 9% 91% of our 2.2 Million G2P associations required integrating 2 or more data sources (this number does not even include orthology (Panther) or any ontologies!) 91%
  • 10. Diagnosing an undiagnosed disease www.owlsim.org
  • 11. Phenotype Exchange Standard Mechanistic discovery Improved searchability Integrated Data Landscape Tool/algorithm creation Cohort identification Patient registries Databases, Web tools, AlgorithmsPhenopacket Registry JournalsDiagnostic screening programs Clinical trials Phenopacket flow Primarybenefits tostakeholders Patients/ Families Physicians Patient matchmaking Diagnosis speed/accuracy Organismal biologist www.phenopackets.org
  • 12. What’s in a Phenopacket? Ontology-based phenotypic descriptions for:  Human patients, model organisms, or any organism  Groupings of human patients or organisms What does it include?  age of patient or organism  sex of patient or organism  disease (if named)  age of onset of disease  Positive and negative phenotype associations  Reference to Genes, variants, or collections of variants  Reference to environmental factors Multiple formats: TSV, JSON, YAML, JSON Validation tools Uses standardized publication citation mechanism for data sharing
  • 13. brca-website.cloudapp.net  13501 variants from ENIGMA, ClinVar, LOVD, exLOVD, BIC  Merged by genomic coordinate and alternate allele string
  • 14. Problems with evidence and provenance of G2P Associations PROBLEMS: Variants have different pathogenicity calls due to annotation inconsistency AND different experimental evidence Incomplete, not computable, and frequently conflated Annotations are to different aspects of the genotype: allele, variant, gene, transcript, etc. A computable model would enable:  context to evaluate credibility/confidence  support filtering and analysis of data  detailed history for attribution
  • 15. Building a computable model for ACMG guidelines http://brcaexchange.org/ Provenance Evidence Claim - Materials & methods - Agent(s) of evidence - Agent(s) of claim - Time and place - Data (eg: images, sequences) - Evidence codes - Publications - Confidence (p-val, z-score) - Summary figures - Conclusions from previous studies - Domain expert’s knowledge Causal relationships, hypothesized relationships, correlations etc. https://github.com/monarch-initiative/SEPIO-ontology
  • 16. Summary  Ontologies can be used to perform deep phenotyping integration across species  An exchange standard is needed to facilitate distributed phenotype data sharing  A computable G2P evidence model can aid variant interpretation
  • 17. Acknowledgements Lawrence Berkeley Chris Mungall Nicole Washington Suzanna Lewis Jeremy Nguyen Seth Carbon Charité Peter Robinson Sebastian Kohler U of Pittsburgh Harry Hochheiser Mike Davis Joe Zhou OHSU Nicole Vasilesky Matt Brush Kent Shefchek Julie McMurry Tom Conlin Genomics England Damian Smedley Jules Jacobson UCSC David Haussler Benedict Paten Mark Diekhans Melissa Cline Garvan Tudor Groza Craig McNamara Edwin Zhang FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C, HHSN268201400093P; NCINCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)

Notas del editor

  1. 2 issues: database integration, vocabulary integration
  2. Multiple databases
  3. Our approach is to try and get the machine to understand the terms so that it can assist us intelligently.
  4. Represent organism as a biological subject Represent diseases/genotypes as collections of nodes in the graph 3. Interoperable with other bioinformatics resources and leverage modern semantic standards
  5. If we include bridging ontologies, we can unify diseases across sources AND phenotypes across sources and organisms.
  6. To support downstream hypothesis testing and evaluation, “trust”, we need a computable model for evidence.
  7. There are a lot of people who have contributed to this work over many years. 