SlideShare una empresa de Scribd logo
1 de 52
Removing roadblocks: leveraging
ontologies for data aggregation and
computation
NCBO Seminar series
March 6th, 2013
Melissa Haendel
On behalf of very many team members
Topics for today
 The Research Symbiosis
 Some Integration Projects Leveraging
Ontologies
 A more complete research profile – integrating
research resources and person information
 Improving query across multiple biospecimen
repositories
 Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
Consult
Databases
Share
Resources/
Data
Publish
papers
Contribute
to
Databases
The Research Symbiosis
Get funding
Do Experiments
The Web
We’ve all been here before:
Ontologies can help us do better.
OMIM Query # of records
“large bone” 1032
"enlarged bone" 207
"big bones" 22
"huge bones" 4
"massive bones" 39
"hyperplastic bones" 12
"hyperplastic bone" 44
"bone hyperplasia" 173
"increased bone growth" 836
Why not just map to ontology terms?
Class A Class B Mapped? Useful?
FMA: extensor
retinaculum of wrist
MouseAnatomy: retina Yes No
Vivo: legal decision Cognitive Atlas: decision Yes No
PlantOntology: Pith MouseAnatomy: medulla Yes No
TaxRank: domain NCI: protein domain Yes No
ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes
FMA: tibia FlyAnatomy: tibia Yes No
FMA: colon GAZ: Colón, Panama Yes No
Quality: male Chebi: maleate 2(-) Yes No
Mapping requires manual work to perform and maintain; string
matching for mapping can lead to spurious results; semantics of
mappings and provenance are not always clear
Topics for today
 The Research Symbiosis
 Some Integration Projects Leveraging
Ontologies
 A more complete research profile – integrating
research resources and person information
 Improving query across multiple biospecimen
repositories
 Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
CTSAconnect:
A Linked Open Data
approach to represent
clinical and research
expertise, activities, and
resources
CTSA 10-001: 100928SB23
PROJECT #: 00921-0001
About eagle-i:
inventories “invisible” resources
Ontology-system for
collecting and querying
research resources
eagle-i.net net w o r k
About VIVO
 Primarily focused on people, activities, and
outcomes typically associated with research
networking
 Eager to represent more diverse components of
expertise, across domains
 e.g., exhibits, performances, specifics about research
 Had worked with core facilities at Cornell to
represent labs, equipment, and services
 Started collaborating with eagle-i to go further
with research resources
At the intersection of Vivo and eagle-i
www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
And then was born the
“CTSAconnect” project
Ok, so it is perhaps not a very informative name for an effort to
consolidate researcher, research activities, and research
resource representation, but what else are we going to call it?
ARG! The Agents, Resources, and Grants ontology
ISF Content and modularization
eagle-I
Research resources
VIVO
Person profiling
ShareCenter
Discussions, requests,
share documents
ISF
Contact Organizations
Affiliations
Services Events
Clinical
Expertise
Reagents
Organisms
Credentials
ISF Modularization
Constraints
• Different ontology modeling principles
• Active ongoing development of eagle-i and VIVO applications
• Investments in existing RDF datasets and the need for stable
targets
Benefits
• Flexibility in what modules to populate at a given site
• Extensibility as needs and feedback influence future evolution
 Annotation view with approved or pending approval.
 Module view shows pending axiom changes per module and has ability to save the
changes with a log comment, and generate the spreadsheet summary
Protégé refactoring plugin
ISF Merging
Relating ICD9 to MeSH in support of
clinical expertise
Clinical expertise data visualization
Building translational teams
 We want to assemble teams of scientists to
examine, for example, specific drugs released
for repurposing
 Hard to identify and connect complementary
basic and clinical expertise across disciplines
Bringing together clinical expertise
and basic science expertise
Representation of a clinician expertise extracted
From ICD-9 codes for
Basic Researcher with Similar
Expertise based on MeSH Terms
Resources
a resource related to Autoimmune disease
Relating researchers across disciplines
Topics for today
 The Research Symbiosis
 Some Integration Projects Leveraging
Ontologies
 A more complete research profile – integrating
research resources and person information
 Improving query across multiple biospecimen
repositories
 Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
OHSU’s Biolibrary and Search Engine
 Data aggregated from two repositories:
– Department of Pathology repository (600K)
– Knight Cancer Institute repository (16K)
 A web-based search engine over de-identified
data
 Our group is applying semantic informatics to
improve
– Data format and quality
– Data integration across the two repositories
– Search capabilities
Funded by Medical Research
Foundation of Oregon
Opportunities for improving the
Biolibrary data
Limited anatomical data
– Cancer registry table has 300+
anatomical entities
– Pathology table only 86
– 99% of pathology reports (600K)
have no anatomical codes
– No anatomical relationships
– Coded sites are not as specific as
descriptions in the pathology
reports
Current Search Interface
Two separate search
interfaces
Multiple forms
Biolibrary Text Search
Syntactic free text
search
Coded Syntactic Search
Search through
anatomy and
histology lists
Extracting ontology concepts
 Pathology reports were the main focus
– Main source of data in the current system
– Contain richer information
 NLP tools were used to identify concepts
 Existing ontology resources were used to
add semantics
Developing a Biospecimen ontology
Phenotypes
(PATO)
Information
Ontology (IAO)
•HPO
•SNOMED
•NCI Thesaurus
•ICDO/ICD9
•GO
•CHEBI
•Cell
Anatomy
(FMA, Uberon)
Medicine
(OGMS)
Classes, Types,
Vocabulary
Data, Instances
Pathology Catalog
Pathology Inventory
Pathology
Report
Instance #123 Instance #456
Instantiates Classifies asUses
Structured data vs. pathology report
(about 7K cases)
However, pathology report also includes:
• Low grade pancreatic intraepithelial neoplasia
• Extensive perineural invasion
• Acute and chronic cholecystitis
• Bile duct tissue with chronic inflammation
• Chronic pancreatitis
• Acute gastric serositis
Available structured data from one case:
Adding Logical Relationships
 About 400 anatomical entities were mapped
to the Foundational Model of Anatomy
 An additional 300 to SNOMED
 Used the is_a and part_of relations
 Re-represented this in a semantic and
computable format
 Allows for semantic queries
Considerations
 Concept mapping helps with document retrieval
 Does not necessarily imply a fact
– Negation
– Differential diagnosis
– Past case history
 Researchers will likely need aggregated facts
from multiple sources to support real research
queries
 Information extraction options are being
explored as part of this work
Topics for today
 The Research Symbiosis
 Some Integration Projects Leveraging
Ontologies
 A more complete research profile – integrating
research resources and person information
 Improving query across multiple biospecimen
repositories
 Identifying disease candidates by leveraging
cross-species anatomy and phenotype queries
Vertebrata
Ascidians
Arthropoda
Annelida
Mollusca
Echinodermata
tetrapod limbs
ampullae
tube feet
parapodia
We want to understand gene
function across taxa
Databasing phenotypes
is hard
• Free text descriptions
• Clinical note
• Models
• Atlases
• Images
• Controlled terms
• Multiple file formats
• Measurements
• …
ATTCGGATTACCGTATTA…
genes, regulatory elements, …
sequence
Sequence data
Databases proliferate
ATTCGGATTACCGTATTA…
genes, regulatory elements, …
sequence
Sequence data
Ontologies as a tool for unification
Disease-
Phenotype
databases
Disease
phenotype
ontology
Expression
data
Gene function
data
Cell and tissue
ontology
GO
annotations
ontologies
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: tool
for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556
Yet problems remains
Incomplete
data
Not connected
ontology
Missing & incorrect
annotations
Multiple
Overlapping
Ontologies
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
ontology
Annotations
miss the important
biology
Ontologies built for one species will
not work for others
http://fme.biostr.washington.edu:8080/FME/index.html
http://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html
Uberon: a multi-species anatomy
ontology
• Contents:
– Over 8,000 classes (terms)
– Multiple relationships, including subclass, part-of and
develops-from
• Scope: metazoa (animals)
– Current focus is chordates
– Federated approach for other taxa
• Uberon classes are generic / species neutral
– ‘mammary gland’: you can use this class for any mammal!
– ‘lung’: you can use this class for any vertebrate (that has
lungs)
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative
multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
http://genomebiology.com/2012/13/1/R5
Bridging anatomy ontologies
ZFA
MA FMA
EHDAA2EMAPA
Uberon
CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel.
Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5
SNOMED
NCIt
GO
CL
UBERON
cerebellum
cerebellar
vermis
pp
cerebellum
cerebellar
vermis
cerebellum
vermis of
cereblleum
posterior
lobe of
cerebellum
pp
MA:mouse
FMA:human
GO/NIF: subcellular GO/NIF: subcellular
axon
CL:Purkinje cell
p
i i
CL:Purkinje cell
axon
i
i
i
i
dendrite dendrite
cerebellum
posterior
lobe
cerebellum
posterior
lobe
p
pp
Uberon enables
queries across
granularity
Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011).
vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5.
http://bgee.unil.ch
Evo-devo applications
Dahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708.
doi:10.1371/journal.pone.0010708
The Monarch Initiative
The model systems research network
We are under construction
Goals are to:
 Aggregate model systems genotype
and phenotype information
 Integrate with network, genomic, and
functional data
 Leverage ontologies for phenotype
similarity matching
 Build knowledge exploration tools for
end users
 Build services for other applications
Funded by NIH # 1R24OD011883-01
Can we search by phenotype alone?
Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-Based
Phenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247
http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247
Integrating phenotypes using ontologies
Organism Genotype
zebrafish fgf8ati282a/+
; shhatq252/tq252
(AB)
mouse B6.Cg-Shhtm1(EGFP/cre)Cjt
/J
worm daf-2(e1370) III; fog-2(oz40) V
human* ATP1A3(NM_152296.3)
[c.946G>A, p.Gly316Ser]+[=]
But..different organisms record
genotypes differently
Phenotypes can be attached to full or partial
genotypes, alleles, or variants
Model systems
phenotype and
genotype data
Pulling it together
NIF DISCO
Data ingest Ontology annotation
OWLSIM
Enabling phenotype-based knowledge discovery tools
ONTOQUEST
Extensible Web resource
DISCOvery, registration and
interoperation framework
MONARCH tools and services
Phenotypic
qualities
Cells
Phenotypic
abnormalities
(Human Mouse
Zebrafish)
Molecular
function
Biological
process
Cellular
component
Anatomy
(Human Mouse Zebrafish)
Molecules
Chemicals Proteins
ZEBRAFISH-Term
"abnormally disrupted
pigmentation"MOUSE-Term
"abnormal ear
pigmentation"
HUMAN-Term
Abnormality of
pigmentation
Uberpheno
Ontologies
Semantic Integration
HP1 HP2 HP3 HP4 HP5
Human
Mouse
Zebrafish
ZP MP ZP
Phenome systems analysis
Phenogram
Genome systems analysis
T P73
tumor protein p73
GN B1
guanine nucleotide
binding protein
Cerebral cortical atrophy
GN B2L1
guanine nucleotide
binding protein (G protein),
Interactome
Orthology Annotation
CNV syndrome
Gene function
       
                
         
                
                
         
                  
             
         
                   
             
         
              
               
         
                     
               
         
         
             
         
                
             
         
            
                
         
              
               
Pheno-cluster
MP
Phenotype Gene
These integration projects…well,
integrate
CTSAconnect
Reveal Connections. Realize Potential.
OHSU Biolibrary
peopleResearch
resources
Clinical
encounters
Phenotypes
biospecimen
s
genes
variations
Conclusions
 Ontologies have provided us the capability to
integrate a variety of biomedical data, at
different levels of granularity, from different
applications, and across domains
 Describing biology works best with multiple
connected ontologies
 We need smart data, not just big data
 We need better tools to integrate multiple
ontologies
 We need better tools to make use of smarter
data structures (e.g. reasoning costs)
Monarch Initiative
CTSAconnect
Biospecimen Ontology
OHSU
Melissa Haendel
Carlo Torniai
Nicole Vasilevsky
Chris Kelleher
Shahim Essaid
Cornell University
Dean Krafft
Jon Corson-Rikert
Brian Lowe
University of Florida
Mike Conlon
Chris Barnes
Nicholas Rejack
OHSU
Melissa Haendel
Shahim Essaid
Carlo Torniai
OHSU
Melissa Haendel
Carlo Torniai
Shahim Essaid
Nicole Vasilevsky
Scott Hoffman
Matt Brush
LBNL
Chris Mungall
Suzi Lewis
Nicole Washington
UCSD/NIF
Maryann Martone
Anita Bandrowski
Jeff Grethe
Amarnath Gupta
Stony Brook University
Moises Eisenberg
Erich Bremer
Janos Hajagos
Harvard University
Daniela Bourges
Sophia Cheng
University at Buffalo
Barry Smith
Dagobert Soergel
Zaloni
Will Corbett
Ranjit Das
Ben Sharma
University of Pittsburgh
Harry Hochheiser
Chuck Borromeo

Más contenido relacionado

La actualidad más candente

dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
drnigam
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
Alejandra Gonzalez-Beltran
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 

La actualidad más candente (20)

Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...A Semantic Web based Framework for Linking Healthcare Information with Comput...
A Semantic Web based Framework for Linking Healthcare Information with Comput...
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
Research resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery systemResearch resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery system
 
Use of data
Use of dataUse of data
Use of data
 
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
dkNET Webinar: "The Microphysiology Systems Database (MPS-Db): A Platform For...
 
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific ExperimentsAn Open Repository Model for Acquiring Knowledge About Scientific Experiments
An Open Repository Model for Acquiring Knowledge About Scientific Experiments
 
Visualizing Primary Data form Taxonomic Literature
Visualizing Primary Data form Taxonomic LiteratureVisualizing Primary Data form Taxonomic Literature
Visualizing Primary Data form Taxonomic Literature
 
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
ICBO2017 - Supporting Ontology-Based Standardization of Biomedical Metadata i...
 
Final Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.KeyFinal Acb All Hands 26 11 07.Key
Final Acb All Hands 26 11 07.Key
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.
 
BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...BioSharing.org - mapping the landscape of community standards, databases, dat...
BioSharing.org - mapping the landscape of community standards, databases, dat...
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 

Similar a NCBO haendel talk 2013

EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
ChemAxon
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
Chimezie Ogbuji
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
Susanna-Assunta Sansone
 

Similar a NCBO haendel talk 2013 (20)

eScience Institute presentation on eagle-i
eScience Institute presentation on eagle-ieScience Institute presentation on eagle-i
eScience Institute presentation on eagle-i
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
schema.org and biomedical ontologies
schema.org and biomedical ontologies schema.org and biomedical ontologies
schema.org and biomedical ontologies
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
The Diversity of Biomedical Data, Databases and Standards (Research Data Alli...
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...Workshop   finding and accessing data - fiona nadia charlotte - cambridge apr...
Workshop finding and accessing data - fiona nadia charlotte - cambridge apr...
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
 
Sabina Leonelli
Sabina LeonelliSabina Leonelli
Sabina Leonelli
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Idcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckleIdcc kansa-kansa-arbuckle
Idcc kansa-kansa-arbuckle
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 

Más de mhaendel

Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
mhaendel
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
mhaendel
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
mhaendel
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
mhaendel
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
mhaendel
 

Más de mhaendel (20)

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
 
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyone
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 

NCBO haendel talk 2013

  • 1. Removing roadblocks: leveraging ontologies for data aggregation and computation NCBO Seminar series March 6th, 2013 Melissa Haendel On behalf of very many team members
  • 2. Topics for today  The Research Symbiosis  Some Integration Projects Leveraging Ontologies  A more complete research profile – integrating research resources and person information  Improving query across multiple biospecimen repositories  Identifying disease candidates by leveraging cross-species anatomy and phenotype queries
  • 4. We’ve all been here before: Ontologies can help us do better. OMIM Query # of records “large bone” 1032 "enlarged bone" 207 "big bones" 22 "huge bones" 4 "massive bones" 39 "hyperplastic bones" 12 "hyperplastic bone" 44 "bone hyperplasia" 173 "increased bone growth" 836
  • 5. Why not just map to ontology terms? Class A Class B Mapped? Useful? FMA: extensor retinaculum of wrist MouseAnatomy: retina Yes No Vivo: legal decision Cognitive Atlas: decision Yes No PlantOntology: Pith MouseAnatomy: medulla Yes No TaxRank: domain NCI: protein domain Yes No ZfishAnat: hypophysis MouseAnatomy: pituitary No Yes FMA: tibia FlyAnatomy: tibia Yes No FMA: colon GAZ: Colón, Panama Yes No Quality: male Chebi: maleate 2(-) Yes No Mapping requires manual work to perform and maintain; string matching for mapping can lead to spurious results; semantics of mappings and provenance are not always clear
  • 6. Topics for today  The Research Symbiosis  Some Integration Projects Leveraging Ontologies  A more complete research profile – integrating research resources and person information  Improving query across multiple biospecimen repositories  Identifying disease candidates by leveraging cross-species anatomy and phenotype queries
  • 7. CTSAconnect: A Linked Open Data approach to represent clinical and research expertise, activities, and resources CTSA 10-001: 100928SB23 PROJECT #: 00921-0001
  • 8. About eagle-i: inventories “invisible” resources Ontology-system for collecting and querying research resources eagle-i.net net w o r k
  • 9. About VIVO  Primarily focused on people, activities, and outcomes typically associated with research networking  Eager to represent more diverse components of expertise, across domains  e.g., exhibits, performances, specifics about research  Had worked with core facilities at Cornell to represent labs, equipment, and services  Started collaborating with eagle-i to go further with research resources
  • 10. At the intersection of Vivo and eagle-i
  • 11. www.ctsaconnect.org CTSAconnect Reveal Connections. Realize Potential. And then was born the “CTSAconnect” project Ok, so it is perhaps not a very informative name for an effort to consolidate researcher, research activities, and research resource representation, but what else are we going to call it? ARG! The Agents, Resources, and Grants ontology
  • 12. ISF Content and modularization eagle-I Research resources VIVO Person profiling ShareCenter Discussions, requests, share documents ISF Contact Organizations Affiliations Services Events Clinical Expertise Reagents Organisms Credentials
  • 13. ISF Modularization Constraints • Different ontology modeling principles • Active ongoing development of eagle-i and VIVO applications • Investments in existing RDF datasets and the need for stable targets Benefits • Flexibility in what modules to populate at a given site • Extensibility as needs and feedback influence future evolution
  • 14.  Annotation view with approved or pending approval.  Module view shows pending axiom changes per module and has ability to save the changes with a log comment, and generate the spreadsheet summary Protégé refactoring plugin
  • 16. Relating ICD9 to MeSH in support of clinical expertise
  • 17. Clinical expertise data visualization
  • 18. Building translational teams  We want to assemble teams of scientists to examine, for example, specific drugs released for repurposing  Hard to identify and connect complementary basic and clinical expertise across disciplines
  • 19. Bringing together clinical expertise and basic science expertise Representation of a clinician expertise extracted From ICD-9 codes for Basic Researcher with Similar Expertise based on MeSH Terms Resources a resource related to Autoimmune disease
  • 21. Topics for today  The Research Symbiosis  Some Integration Projects Leveraging Ontologies  A more complete research profile – integrating research resources and person information  Improving query across multiple biospecimen repositories  Identifying disease candidates by leveraging cross-species anatomy and phenotype queries
  • 22. OHSU’s Biolibrary and Search Engine  Data aggregated from two repositories: – Department of Pathology repository (600K) – Knight Cancer Institute repository (16K)  A web-based search engine over de-identified data  Our group is applying semantic informatics to improve – Data format and quality – Data integration across the two repositories – Search capabilities Funded by Medical Research Foundation of Oregon
  • 23. Opportunities for improving the Biolibrary data Limited anatomical data – Cancer registry table has 300+ anatomical entities – Pathology table only 86 – 99% of pathology reports (600K) have no anatomical codes – No anatomical relationships – Coded sites are not as specific as descriptions in the pathology reports
  • 24. Current Search Interface Two separate search interfaces Multiple forms
  • 26. Coded Syntactic Search Search through anatomy and histology lists
  • 27. Extracting ontology concepts  Pathology reports were the main focus – Main source of data in the current system – Contain richer information  NLP tools were used to identify concepts  Existing ontology resources were used to add semantics
  • 28. Developing a Biospecimen ontology Phenotypes (PATO) Information Ontology (IAO) •HPO •SNOMED •NCI Thesaurus •ICDO/ICD9 •GO •CHEBI •Cell Anatomy (FMA, Uberon) Medicine (OGMS) Classes, Types, Vocabulary Data, Instances Pathology Catalog Pathology Inventory Pathology Report Instance #123 Instance #456 Instantiates Classifies asUses
  • 29. Structured data vs. pathology report (about 7K cases) However, pathology report also includes: • Low grade pancreatic intraepithelial neoplasia • Extensive perineural invasion • Acute and chronic cholecystitis • Bile duct tissue with chronic inflammation • Chronic pancreatitis • Acute gastric serositis Available structured data from one case:
  • 30. Adding Logical Relationships  About 400 anatomical entities were mapped to the Foundational Model of Anatomy  An additional 300 to SNOMED  Used the is_a and part_of relations  Re-represented this in a semantic and computable format  Allows for semantic queries
  • 31. Considerations  Concept mapping helps with document retrieval  Does not necessarily imply a fact – Negation – Differential diagnosis – Past case history  Researchers will likely need aggregated facts from multiple sources to support real research queries  Information extraction options are being explored as part of this work
  • 32. Topics for today  The Research Symbiosis  Some Integration Projects Leveraging Ontologies  A more complete research profile – integrating research resources and person information  Improving query across multiple biospecimen repositories  Identifying disease candidates by leveraging cross-species anatomy and phenotype queries
  • 34. Databasing phenotypes is hard • Free text descriptions • Clinical note • Models • Atlases • Images • Controlled terms • Multiple file formats • Measurements • … ATTCGGATTACCGTATTA… genes, regulatory elements, … sequence Sequence data
  • 36. Ontologies as a tool for unification Disease- Phenotype databases Disease phenotype ontology Expression data Gene function data Cell and tissue ontology GO annotations ontologies Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., et al. (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1). doi:10.1038/75556
  • 37. Yet problems remains Incomplete data Not connected ontology Missing & incorrect annotations Multiple Overlapping Ontologies ontology ontology ontology ontology ontology ontology ontology ontology ontology ontology ontology ontology ontology ontology ontology Annotations miss the important biology
  • 38. Ontologies built for one species will not work for others http://fme.biostr.washington.edu:8080/FME/index.html http://ccm.ucdavis.edu/bcancercd/22/mouse_figure.html
  • 39. Uberon: a multi-species anatomy ontology • Contents: – Over 8,000 classes (terms) – Multiple relationships, including subclass, part-of and develops-from • Scope: metazoa (animals) – Current focus is chordates – Federated approach for other taxa • Uberon classes are generic / species neutral – ‘mammary gland’: you can use this class for any mammal! – ‘lung’: you can use this class for any vertebrate (that has lungs) Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 http://genomebiology.com/2012/13/1/R5
  • 40. Bridging anatomy ontologies ZFA MA FMA EHDAA2EMAPA Uberon CJ Mungall, C Torniai, GV Gkoutos, SE Lewis, MA Haendel. Uberon, an integrative multi-species anatomy ontology. Genome biology 13 (1), R5 SNOMED NCIt GO CL
  • 41. UBERON cerebellum cerebellar vermis pp cerebellum cerebellar vermis cerebellum vermis of cereblleum posterior lobe of cerebellum pp MA:mouse FMA:human GO/NIF: subcellular GO/NIF: subcellular axon CL:Purkinje cell p i i CL:Purkinje cell axon i i i i dendrite dendrite cerebellum posterior lobe cerebellum posterior lobe p pp Uberon enables queries across granularity
  • 42. Niknejad, A., Comte, A., Parmentier, G., Roux, J., Frederic, B., & Robinson-rechavi, M. (2011). vHOG , a multi-species vertebrate ontology of homologous or- gans groups. Ecology, 1-5. http://bgee.unil.ch
  • 43. Evo-devo applications Dahdul, et al. 2010. Evolutionary characters, phenotypes and ontologies: curating data from the systematic biology literature. PLoS ONE 5(5):e10708. doi:10.1371/journal.pone.0010708
  • 44. The Monarch Initiative The model systems research network We are under construction Goals are to:  Aggregate model systems genotype and phenotype information  Integrate with network, genomic, and functional data  Leverage ontologies for phenotype similarity matching  Build knowledge exploration tools for end users  Build services for other applications Funded by NIH # 1R24OD011883-01
  • 45. Can we search by phenotype alone? Washington NL, Haendel MA, Mungall CJ, Ashburner M, et al. (2009) Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol 7(11): e1000247. doi:10.1371/journal.pbio.1000247 http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000247
  • 47. Organism Genotype zebrafish fgf8ati282a/+ ; shhatq252/tq252 (AB) mouse B6.Cg-Shhtm1(EGFP/cre)Cjt /J worm daf-2(e1370) III; fog-2(oz40) V human* ATP1A3(NM_152296.3) [c.946G>A, p.Gly316Ser]+[=] But..different organisms record genotypes differently Phenotypes can be attached to full or partial genotypes, alleles, or variants
  • 48. Model systems phenotype and genotype data Pulling it together NIF DISCO Data ingest Ontology annotation OWLSIM Enabling phenotype-based knowledge discovery tools ONTOQUEST Extensible Web resource DISCOvery, registration and interoperation framework MONARCH tools and services
  • 49. Phenotypic qualities Cells Phenotypic abnormalities (Human Mouse Zebrafish) Molecular function Biological process Cellular component Anatomy (Human Mouse Zebrafish) Molecules Chemicals Proteins ZEBRAFISH-Term "abnormally disrupted pigmentation"MOUSE-Term "abnormal ear pigmentation" HUMAN-Term Abnormality of pigmentation Uberpheno Ontologies Semantic Integration HP1 HP2 HP3 HP4 HP5 Human Mouse Zebrafish ZP MP ZP Phenome systems analysis Phenogram Genome systems analysis T P73 tumor protein p73 GN B1 guanine nucleotide binding protein Cerebral cortical atrophy GN B2L1 guanine nucleotide binding protein (G protein), Interactome Orthology Annotation CNV syndrome Gene function                                                                                                                                                                                                                                                                                                                                                                                                                  Pheno-cluster MP Phenotype Gene
  • 50. These integration projects…well, integrate CTSAconnect Reveal Connections. Realize Potential. OHSU Biolibrary peopleResearch resources Clinical encounters Phenotypes biospecimen s genes variations
  • 51. Conclusions  Ontologies have provided us the capability to integrate a variety of biomedical data, at different levels of granularity, from different applications, and across domains  Describing biology works best with multiple connected ontologies  We need smart data, not just big data  We need better tools to integrate multiple ontologies  We need better tools to make use of smarter data structures (e.g. reasoning costs)
  • 52. Monarch Initiative CTSAconnect Biospecimen Ontology OHSU Melissa Haendel Carlo Torniai Nicole Vasilevsky Chris Kelleher Shahim Essaid Cornell University Dean Krafft Jon Corson-Rikert Brian Lowe University of Florida Mike Conlon Chris Barnes Nicholas Rejack OHSU Melissa Haendel Shahim Essaid Carlo Torniai OHSU Melissa Haendel Carlo Torniai Shahim Essaid Nicole Vasilevsky Scott Hoffman Matt Brush LBNL Chris Mungall Suzi Lewis Nicole Washington UCSD/NIF Maryann Martone Anita Bandrowski Jeff Grethe Amarnath Gupta Stony Brook University Moises Eisenberg Erich Bremer Janos Hajagos Harvard University Daniela Bourges Sophia Cheng University at Buffalo Barry Smith Dagobert Soergel Zaloni Will Corbett Ranjit Das Ben Sharma University of Pittsburgh Harry Hochheiser Chuck Borromeo

Notas del editor

  1. How can we:Help science be more reproducible?Provide access to resources and expertise?Give credit where credit is due? Make data more interoperable and visible?Make science more efficient?The projects that I am going to talk about all involve supporting and strengthening these connections
  2. Need different figure
  3. ChrisMungallA genome is a genome, whether it’s an amoeba or a human. Tweets.Mention models hereEnvironment too
  4. ChrisMungallWhat tends to happen is that multiple non-interoperable
  5. Chris Mungall
  6. Chris MungallHumans must manually integrateMachines can’t make sense of this alone
  7. Images: Seth Ruffins
  8. Icbo paper.
  9. Multiple coordinated: need to describe cell and tissue context