SlideShare una empresa de Scribd logo
1 de 45
Unifying ontology services for
functional genomic annotations
Tomasz Adamusiak MD PhD
   7omasz

Postdoc at LHC CgSB since 10/2011


                                                                                       1
                               EBI is an Outstation of the European Molecular Biology Laboratory.
The European Molecular Biology Laboratory, a
        “European NIH” for molecular biology
        Heidelberg               Hamburg             Hinxton




        Basic research in       Structural biology   Bioinformatics
        molecular biology
        Administration           Grenoble            Monterotondo
        EMBO

    •       1500 staff
    •       >60 nationalities
                                Structural biology   Mouse biology



                                                                      2
2
EMBL-EBI external funding

    • Sources of external funding as of December 2010




                                               And no taxes!




                                                               3
3
Focus on providing database services to bioinformatics
community                              Literature and ontologies
                                                           CiteXplore, GO
    Genomes
     Ensembl
 Ensembl Genomes                                           Protein families,
      EGA                                                 motifs and domains
                               Functional                       InterPro
Nucleotide sequence            genomics
        ENA                   ArrayExpress
                             Expression Atlas                    Macromolecular
                                  EFO                                  PDBe

        Protein activity
        IntAct , PRIDE
                                                                        Pathways
                                                                        Reactome
                 Protein Sequences
                           UniProt


                            Chemical entities                           Systems
                                ChEBI                                  BioModels
                                                                       BioSamples
                                          Chemogenomics
                                             ChEMBL                         4
ArrayExpress is the 2nd largest resource for public
transcriptomics data (CIBEX < AE < GEO)


  ‘blood cancer’
  ‘hematological neoplasm’
  ‘haematological neoplasm’
                              Archive
  ‘lymphoma/leukemia’                     EFO: lymphoid neoplasm   Atlas
  ‘leukaemia’
  ‘haematological cancer’

                               25k exps                            2.6k exps




                                                          EFO




                                                                               5
Experimental Factor Ontology (EFO)

• Modelling experimental factors currently in Archive:
  species, diseases, cell lines, etc.
• Capture ~30% not in UMLS
• Determined by Atlas, Ensembl, external requests (Upenn)
  and EBI site-wide search




                                                      6
Developed a process to automatically import metadata
from reference ontologies and validate changes
                                 20000

                                                                                SYNONYMS
                                 18000


                                 16000
 Number of classes or synonyms




                                 14000


                                 12000


                                 10000


                                  8000


                                  6000                                          CLASSES

                                  4000


                                  2000


                                     0
                                     Aug-08   Jan-09   Jan-10          Jan-11         Aug-11
                                                                Time




                                                                                          7
Step 1: xrefs are acquired by fuzzy lexical
matching to domain ontologies
                                         Disease Ontology
acute lymphoblastic leukemia             acute lymphocytic leukemia
http://www.ebi.ac.uk/efo/EFO_0000220     DOID:9952

xref:                                    NCI Thesaurus
DOID:9952
xref:                                    Acute Lymphoblastic Leukemia
NCIt:C3167                               C3167




                                                         map to EFO
xref:
DOID:9952           potential synonymy      Perl mapping scripts
xref:                                       EBI::FGPT::FuzzyRecogniser
NCIt:C3167                                  OWL::Simple::Parser


                                                                         8
Did not evaluate Norm in this context

• Production requirements (Perl, OWL)
• Improvement (ngrams) over legacy code
• Primary use case mapping EFO against AE annotations:
   • 2'-deoxy-5-azacytidine to 5-aza-2'-deoxycytidine CHEBI:50131
   • Barrett&#39;s Esophagus to Barrett's esophagus


• Difficult to use MetaMap on non-UMLS ontologies




                                                                9
Step 2: definitions and synonyms are pulled in
    from reference ontologies via NCBO BioPortal
S   acute lymphoblastic leukemia
    http://www.ebi.ac.uk/efo/EFO_0000220
T
E   xref:
    xref:
    SNOMEDCT:91857003                      translate IDs
P   DOID:9952
    xref:
    xref:
    NCIt:C3167
1   NCIt:C3167
    synonym:
    Acute lymphoid leukaemia, disease
    definition:
S   Leukemia with an acute onset [...]                     fetch
T   bioportal_provenance:
E   Acute Lymphocytic Leukaemia
    [accessedResource: NCIt:C3167]
P   [accessDate: 05-04-2011]
    bioportal_provenance:
2   Leukemia with an acute onset [...]
                                           + provenance
    [accessedResource: NCIt:C3167]
    [accessDate: 05-04-2011]
                                                             10
Step 3: regression testing package produces a
report for manual verification of the import
• 13 different tests
• Shared xrefs, e.g. NCIt:C17459 (Hispanic or Latino)
   • Hispanic (EFO_0003169)
   • Latino (EFO_0003166)
• Shared synonyms, e.g. head kidney (ZFA:0000669)
   • pronephros (EFO_0000927)
   • bone marrow (EFO_0000868)
• Changes in external sources (11/2010 vs. 5/2010):
   • synonym Spinocerebellar Ataxias (EFO_0002624) no longer in
     DOID:1441
   • definition Organ with organ cavity which connects the cavity of the
     urinary bladder to the exterior. […] (EFO_0000931) no longer in
     FMAID:1966
                                                                   11
EFO has a unique XSLT-based web presence
http://www.ebi.ac.uk/efo/overview




                                           12
EFO URIs are readable by humans and computers




                                          13
Content negotiation is an alternative approach
Tuckey’s server side urlrewritefilter

<rule>
           <condition name="Accept" type="header">
                   application/rdf+xml</condition>
           <from>^/$</from>
           <to type="redirect">/efo/efo.owl</to>
</rule>




                                                      14
The Semantic Web provides a common framework that allows data to
be shared and reused across application, enterprise, and community
boundaries (W3C)

If you want to put something on the web there are three rules:
1. All kinds of conceptual things, they have names now that start with
     HTTP.
2. If I take one of these HTTP names and I look it up [...] I fetch the
     data using the HTTP protocol from the web, I will get back some
     data in a standard format
3. It's got relationships [..] the other thing that it's related to is given
     one of those names that starts HTTP. So, I can go ahead and look
     that thing up.
                        Sir Tim Berners-Lee on the next Web (TED2009)




                                                                       15
RDF triple is the core concept underpinning the
semantic web


              subject                               predicate                          object
<http://www.example.com/index.html>   <http://purl.org/dc/elements/1.1/creator>   „John Smith”




                                                    dc:creator
         example:index.html                                                          John Smith




Entity Attribute Value (EAV) model with well defined semantics



                                                                                                  16
Open linked data lacks central URI reconciliation

• Responsibility for URIs:
http://bio2rdf.org/mesh:68009154
http://bio2rdf.org/pubmed:11992264
http://bio2rdf.org/go:0016458
http://purl.org/obo/owl/GO#GO_0016458
• Versioning:
http://sig.uw.edu/fma#Anatomical_entity (FMA 3.1)
http://sig.biostr.washington.edu/fma3.0#Anatomical_entity (FMA 3.0)
http://purl.obolibrary.org/obo/GO_0016458 (Foundry-compliant URI)
• Requires institutional support
• Would be great to have public UMLS in RDF


                                                                  17
Common Ontology Application Tasks
(OntoCAT)


  ‘blood cancer’
  ‘hematological neoplasm’
  ‘haematological neoplasm’
                              Archive
  ‘lymphoma/leukemia’                     EFO: lymphoid neoplasm   Atlas
  ‘leukaemia’
  ‘haematological cancer’

                               25k exps                            2.6k exps




                                                           EFO




                                                                               18
There is no single ontology resource that covers
all the use cases

Local ontologies in OWL/OBO

NCBO BioPortal

EBI Ontology Lookup Service




...and no huffing and puffing
    will blow all of them down...
                                    Leonard Leslie Brooke (1904)   19
EBI Ontology Lookup Service

•   82 ontologies
•   OBO ontologies
•   SOAP web services/Java client
•   First out there




Cote RG, Jones P, Apweiler R, Hermjakob H.The ontology lookup service, a
lightweight cross-platform tool for controlled vocabulary queries.
BMC Bioinformatics. 2006 Feb 28;7(1):97
                                                                       20
NCBO BioPortal

•   267 ontologies and growing
•   Both OWL and OBO
•   REST web services
•   Rich in functionality




Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C.,
Rubin, D.L., Storey, M.A., Chute, C.G., Musen, M.A.BioPortal: ontologies and
integrated data resources at the click of a mouse. Nucleic Acids Res. 2009 Jul
1;37(Web Server issue):W170-3.                                                  21
OLS vs. BioPortal (July, 2010)




                                 22
OWL API

• Reference implementation for manipulating and
  serialising OWL2
• Multiple parsers (incl. OBO)
• Reasoner interfaces

• Low level access



Sean Bechhofer, Phillip Lord, Raphael Volz. Cooking the Semantic Web with the
OWL API. 2nd International Semantic Web Conference, ISWC, Sanibel Island,
Florida, October 2003



                                                                        23
We wanted to annotate data with ontology terms within the
MOLGENIS framework – ontology browser



                         OWL API
EFO Bioportal Import




 Ontology Browser




                                                       24
Integration is hard




                      25
A simple facade to ontology resources providing a set of
functions most common to ontology APIs (e.g. HL7 CTS2,
UMLS API) under a single interface
http://www.ontocat.org

                         BioPortal
                                               searchAll()
                                               searchOntology()
                                               getChildren()
                                     EBI OLS
                                               getParents()
                                               getSynonyms()
                                               getDefinitions()
                                      OWL      getAllParents()
                                               getAllChildren()
                                               getRelations()
                           OBO                 ...
                   ?



                                                                  26
There are many ways how you could use
OntoCAT
• Store data and annotate with ontology terms
   • OntoCAT database and browser
• Work with ontologies in R
   • Bioconductor ontocat R package
• Integrate a number of ontologies in a local repository
   • OntoCAT REST server
• Add ontology support to your GWT web application
   • OntoCAT GoogleApp



http://www.ontocat.org/wiki/OntocatDownload

                                                           27
The curious case of OntoFox, OntoBee, and
OntoCAT




                                       28
Developed for internal and external use cases
Example 11@ontocat.org

• Automatically obtain CUIs from UMLS sources for
  extracted terms via BioPortal
• Shamim Mollah, Bleeding History Phenotype Ontology,
  Rockefeller University Center for Clinical and
  Translational Science, New York, NY

1. Get all terms from BHP
2. Search for corresponding UMLS terms (also MetaMap)
3. Obtain CUIs for mapped terms through BioPortal




                                                    29
ontocat R is first on Google




                               30
Use case – explore beyond subsumption
Example 16@ontocat.org
• Requested by reviewer for partonomy in GO
• Easy in OBO, hard in OWL

• Computationally intensive:
   •   (starting from the root node)
   •   1. classify all children of inverse_relation some class
   •   2. repeat 1. on all new nodes
   •   3. finish if all nodes were seen


• OWL API is not thread safe


                                                                 31
Reasoning is fundamental to exploring the
hierarchies of more expressive ontologies

                                      Heart
                         Heart
                                    Component




                          Left
                         Heart

                                        partOf
                                         is_a


                         Mitral
                         Valve
                                            32
When ontologies classify as inconsistent it is not often
obvious why (Open World Assumption)
• Mary is_a CitizenOfFrance
Is Paul a citizen of France?
Closed World, e.g. SQL databases: NO
Ontologies: ?

• OWL is more expressive: classes, individuals, closure
  axioms, value partitions, cardinality restrictions, property
  chains; disjoint, reflexive, irreflexive, symmetric and anti-
  symmetric, inverse or transitive properties

• Explanation in OWL
  (http://owl.cs.manchester.ac.uk/explanation/)
                                                           33
The extra information is used in QC of EFO, but
not in query expansion

                                         ventricular         subClassOf
    cardiomyopathy
                                         myocardium          part_of

                                                             has_disease_location

     myocardium


                                                       atrial myocardium

                     cardiac ventricle



                                         atrium


        heart                                           atrial fibrillation



                Heart disease?                                                34
Google analytics for ontocat.org




                                   35
OntoCAT is enterprise-grade, low-maintenance,
headache-free, zero-configuration software
•   Java6, maven and ant support
•   Open source (LGPL v3)
•   137 unit tests
•   Hudson daily builds


                        Tests passed
• Flexibility through
  design patterns:
    • Decorators
    • Proxies
    • Composites

                                       Daily builds

                                                      36
Semantic Web Atlas of Gene Expression



  ‘blood cancer’
  ‘hematological neoplasm’
  ‘haematological neoplasm’
                              Archive
  ‘lymphoma/leukemia’                     EFO: lymphoid neoplasm   Atlas
  ‘leukaemia’
  ‘haematological cancer’

                               25k exps                            2.6k exps




                                                           EFO




                                                                               37
EFO inferred is_a hierarchy defines how experiments are
aggregated in Atlas for re-analysis
http://www.ebi.ac.uk/gxa




                                                      38
It is possible to infer diseases of heart computationally
rather than asserting this information directly


                                            ventricular       subClassOf
     cardiomyopathy
                                            myocardium        part_of

                                                              has_disease_location

      myocardium


                                                          atrial myocardium

                        cardiac ventricle



                                            atrium


         heart                                             atrial fibrillation



                 has_disease_location ∃ (heart ∪ part_of ∃ heart)
                 heart disease ≡
                                                                                 39
One RDF graph per experiment accession
    Context-specific gene expression is grouped with blank nodes
                                 experiment accession                                                         Predicates
        Homo sapiens
                                   efo:EFO_0004033
                                                                                              rdf:type                     rdf:type
                                                                      liver
                                                                                              organism                     OBI_0100026
                    E-AFMX-1
                  gxa:E-AFMX-1                                                                is_about                     IAO_0000136
                                                                    NONDE
                                                                                              gene                         EFO_0002606

                                                                                              experimental factor          EFO_0000001
                                                                   1.0E30
                                                                                              discretized differential
                                                                                                                           EFO_0004034
                                                                                              expression

                                                                                              p value                      OBI_0000175




                                                                             PRDX2
                                                                    ensembl:ENSG00000167815




                                                                                                       gene
                                                                                                 efo:EFO_0002606




W3C Note on RDF Approach to Gene Expression Data (in progress)
                                                                                                                            40
Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
One RDF graph per experiment accession
    Context-specific gene expression is grouped with blank nodes
                                         experiment accession                                                 Predicates
         Homo sapiens
                                           efo:EFO_0004033
                                                                                              rdf:type                     rdf:type
                                                                      liver
                                                                                              organism                     OBI_0100026
                         E-AFMX-1
                       gxa:E-AFMX-1                                                           is_about                     IAO_0000136
                                                                    NONDE
                                                                                              gene                         EFO_0002606

                                                                                              experimental factor          EFO_0000001
                                                                   1.0E30
                                                                                              discretized differential
                                                                                                                           EFO_0004034
                                                                                              expression

                                                                                              p value                      OBI_0000175




    approximately 14                                                         PRDX2
         weeks                                                      ensembl:ENSG00000167815



                         NONDE

                                                                                                       gene
                                                                                                 efo:EFO_0002606
                                      1.0E30




W3C Note on RDF Approach to Gene Expression Data (in progress)
                                                                                                                            41
Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
Sesame triplestore provided the shortest
Time-to-Market

                          REST + XSLT = RDF




                                       WAR                    RDF




                                                Jun Zhao
                                                @ Oxford
 tc-test-3   tomcat-7      tomcat-8


                  Load Balancer
                                                Jena TDB
                                                  Milarq
 wwwdev           www.ebi.ac.uk                 www.open-     42
                                              biomed.org.uk
Semantic Web is unlikely to take over the web,
but has the potential to unify all of bioinformatics




                                                                OntoCAT
                                                                EFO
                                                               Semantic
                                                                 Atlas
http://gigaom.com/broadband/the-storage-vs-bandwidth-debate/              43
Acknowledgments
•   Morris A. Swertz’s group at the Genomics Coordination Center (GCC),
    University of Groningen
                                           This work was supported by the European
     •   K Joeri van der Velde
                                           Community's Seventh Framework
     •   Despoina Antonakaki               Programmes GEN2PHEN [grant number
     •   Dasha Zhernakova                  200754], SLING [grant number 226073], and
                                           SYBARIS [grant number 242220], the
•   James Malone                           European Molecular Biology Laboratory, the
•   Helen Parkinson                        Netherlands Organisation for Scientific
                                           Research [NWO/Rubicon grant number
•   FuzzyRecogniser: Emma Hastings         825.09.008], and the Netherlands
•   Niran Abeygunawardena                  Bioinformatics Centre [BioAssist/Biobanking
                                           platform and BioRange grant SP1.2.3]
•   Ele Holloway
•   Tim Rayner                         OntoCAT logo courtesy of Eamonn Maguire
•   Zooma: Tony Burdett
•   Bioconductor/R package: Natalja Kurbatova, Pavel Kurnosov, Misha
    Kapushesky
                                           Special thanks go to NCBO BioPortal and
                                           EBI OLS support teams for all the
                                           comprehensive help they provide
                                                                                     44
slides @ www.slideshare.net/adamusiak




Thank you!
                                        45
4
5

Más contenido relacionado

La actualidad más candente

Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Robert (Rob) Salomon
 
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...Barry Smith
 
Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020
Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020
Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020M. Luisetto Pharm.D.Spec. Pharmacology
 
Presentation on flow cytometry1
Presentation on flow cytometry1Presentation on flow cytometry1
Presentation on flow cytometry1Nagendra sharma
 

La actualidad más candente (6)

Opella l3
Opella l3Opella l3
Opella l3
 
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
Genomic Cytometry: Using Multi-Omic Approaches to Increase Dimensionality in ...
 
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...
ENVO: The Environment Ontology (Presentation at the Genomics Standards Consor...
 
CVaddinfo
CVaddinfoCVaddinfo
CVaddinfo
 
Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020
Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020
Tractatus vision system evolution in metazoan-RESEARCHGATE luisetto m et al 2020
 
Presentation on flow cytometry1
Presentation on flow cytometry1Presentation on flow cytometry1
Presentation on flow cytometry1
 

Similar a Unifying ontology services for functional genomic annotations

1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)phdcareers
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesConnected Data World
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyMelanie Courtot
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnected Data World
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...Neo4j
 
EMBL-EBI at Plant and Animal Genome conference
EMBL-EBI at Plant and Animal Genome conference EMBL-EBI at Plant and Animal Genome conference
EMBL-EBI at Plant and Animal Genome conference Denise Carvalho-Silva, PhD
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012Brock University
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsChristoph Steinbeck
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways: Chris Evelo
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012KUPKB_Team
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppSimon Jupp
 
Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013Iddo
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsMakarand Bhale
 
Multi-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application DomainsMulti-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application DomainsChristoph Steinbeck
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesJackie Wirz, PhD
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 

Similar a Unifying ontology services for functional genomic annotations (20)

1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)1 introduction to_the_ebi_(katrina_pavelin)
1 introduction to_the_ebi_(katrina_pavelin)
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016
 
Ontology Services for the Biomedical Sciences
Ontology Services for the Biomedical SciencesOntology Services for the Biomedical Sciences
Ontology Services for the Biomedical Sciences
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Connecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics InstituteConnecting life sciences data at the European Bioinformatics Institute
Connecting life sciences data at the European Bioinformatics Institute
 
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
 
EMBL-EBI at Plant and Animal Genome conference
EMBL-EBI at Plant and Animal Genome conference EMBL-EBI at Plant and Animal Genome conference
EMBL-EBI at Plant and Animal Genome conference
 
University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012University of Toronto Chemistry Librarians Workshop June 2012
University of Toronto Chemistry Librarians Workshop June 2012
 
Human encodeproject
Human encodeprojectHuman encodeproject
Human encodeproject
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
JulieKlein_Bosc2012
JulieKlein_Bosc2012JulieKlein_Bosc2012
JulieKlein_Bosc2012
 
Facilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-juppFacilitating semantic alignment.-biohackathon-jupp
Facilitating semantic alignment.-biohackathon-jupp
 
Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013Ewan Birney Biocuration 2013
Ewan Birney Biocuration 2013
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
Multi-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application DomainsMulti-Omics Bioinformatics across Application Domains
Multi-Omics Bioinformatics across Application Domains
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 

Más de Tomasz Adamusiak

Accelerate AI | Knowledge Graphs in Financial Technology - Future or Hype
Accelerate AI | Knowledge Graphs in Financial Technology - Future or HypeAccelerate AI | Knowledge Graphs in Financial Technology - Future or Hype
Accelerate AI | Knowledge Graphs in Financial Technology - Future or HypeTomasz Adamusiak
 
Healthcare Standards? What a Concept!
Healthcare Standards? What a Concept!Healthcare Standards? What a Concept!
Healthcare Standards? What a Concept!Tomasz Adamusiak
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataTomasz Adamusiak
 
EHR-based Phenome Wide Association Study in Pancreatic Cancer
EHR-based Phenome Wide Association Study in Pancreatic CancerEHR-based Phenome Wide Association Study in Pancreatic Cancer
EHR-based Phenome Wide Association Study in Pancreatic CancerTomasz Adamusiak
 
Creating Dynamic Groupers Using Overrepresentation of Clinical Terms
Creating Dynamic Groupers Using Overrepresentation of Clinical TermsCreating Dynamic Groupers Using Overrepresentation of Clinical Terms
Creating Dynamic Groupers Using Overrepresentation of Clinical TermsTomasz Adamusiak
 
Semantic Interoperability in Health Information Exchange
Semantic Interoperability in Health Information ExchangeSemantic Interoperability in Health Information Exchange
Semantic Interoperability in Health Information ExchangeTomasz Adamusiak
 
Re-identification of de-identified PHI date elements
Re-identification of de-identified PHI date elementsRe-identification of de-identified PHI date elements
Re-identification of de-identified PHI date elementsTomasz Adamusiak
 
Medication Reconciliation in Electronic Health Information Exchange
Medication Reconciliation in Electronic Health Information ExchangeMedication Reconciliation in Electronic Health Information Exchange
Medication Reconciliation in Electronic Health Information ExchangeTomasz Adamusiak
 
Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...
Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...
Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...Tomasz Adamusiak
 
Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...
Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...
Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...Tomasz Adamusiak
 
Quality Assurance in LOINC® using Description Logic
Quality Assurance in LOINC® using Description LogicQuality Assurance in LOINC® using Description Logic
Quality Assurance in LOINC® using Description LogicTomasz Adamusiak
 
EFO tools - the good, the great, and the evil
EFO tools - the good, the great, and the evilEFO tools - the good, the great, and the evil
EFO tools - the good, the great, and the evilTomasz Adamusiak
 
OntoCAT - integrated programming toolkit for common ontology application task...
OntoCAT - integrated programming toolkit for common ontology application task...OntoCAT - integrated programming toolkit for common ontology application task...
OntoCAT - integrated programming toolkit for common ontology application task...Tomasz Adamusiak
 

Más de Tomasz Adamusiak (13)

Accelerate AI | Knowledge Graphs in Financial Technology - Future or Hype
Accelerate AI | Knowledge Graphs in Financial Technology - Future or HypeAccelerate AI | Knowledge Graphs in Financial Technology - Future or Hype
Accelerate AI | Knowledge Graphs in Financial Technology - Future or Hype
 
Healthcare Standards? What a Concept!
Healthcare Standards? What a Concept!Healthcare Standards? What a Concept!
Healthcare Standards? What a Concept!
 
Connecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked DataConnecting the dots: drug information and Linked Data
Connecting the dots: drug information and Linked Data
 
EHR-based Phenome Wide Association Study in Pancreatic Cancer
EHR-based Phenome Wide Association Study in Pancreatic CancerEHR-based Phenome Wide Association Study in Pancreatic Cancer
EHR-based Phenome Wide Association Study in Pancreatic Cancer
 
Creating Dynamic Groupers Using Overrepresentation of Clinical Terms
Creating Dynamic Groupers Using Overrepresentation of Clinical TermsCreating Dynamic Groupers Using Overrepresentation of Clinical Terms
Creating Dynamic Groupers Using Overrepresentation of Clinical Terms
 
Semantic Interoperability in Health Information Exchange
Semantic Interoperability in Health Information ExchangeSemantic Interoperability in Health Information Exchange
Semantic Interoperability in Health Information Exchange
 
Re-identification of de-identified PHI date elements
Re-identification of de-identified PHI date elementsRe-identification of de-identified PHI date elements
Re-identification of de-identified PHI date elements
 
Medication Reconciliation in Electronic Health Information Exchange
Medication Reconciliation in Electronic Health Information ExchangeMedication Reconciliation in Electronic Health Information Exchange
Medication Reconciliation in Electronic Health Information Exchange
 
Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...
Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...
Integrating SNOMED CT with other Meaningful Use vocabulary standards (LOINC, ...
 
Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...
Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...
Next-generation phenotyping using UMLS and Meaningful Use ontologies: SNOMED ...
 
Quality Assurance in LOINC® using Description Logic
Quality Assurance in LOINC® using Description LogicQuality Assurance in LOINC® using Description Logic
Quality Assurance in LOINC® using Description Logic
 
EFO tools - the good, the great, and the evil
EFO tools - the good, the great, and the evilEFO tools - the good, the great, and the evil
EFO tools - the good, the great, and the evil
 
OntoCAT - integrated programming toolkit for common ontology application task...
OntoCAT - integrated programming toolkit for common ontology application task...OntoCAT - integrated programming toolkit for common ontology application task...
OntoCAT - integrated programming toolkit for common ontology application task...
 

Último

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Último (20)

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Unifying ontology services for functional genomic annotations

  • 1. Unifying ontology services for functional genomic annotations Tomasz Adamusiak MD PhD 7omasz Postdoc at LHC CgSB since 10/2011 1 EBI is an Outstation of the European Molecular Biology Laboratory.
  • 2. The European Molecular Biology Laboratory, a “European NIH” for molecular biology Heidelberg Hamburg Hinxton Basic research in Structural biology Bioinformatics molecular biology Administration Grenoble Monterotondo EMBO • 1500 staff • >60 nationalities Structural biology Mouse biology 2 2
  • 3. EMBL-EBI external funding • Sources of external funding as of December 2010 And no taxes! 3 3
  • 4. Focus on providing database services to bioinformatics community Literature and ontologies CiteXplore, GO Genomes Ensembl Ensembl Genomes Protein families, EGA motifs and domains Functional InterPro Nucleotide sequence genomics ENA ArrayExpress Expression Atlas Macromolecular EFO PDBe Protein activity IntAct , PRIDE Pathways Reactome Protein Sequences UniProt Chemical entities Systems ChEBI BioModels BioSamples Chemogenomics ChEMBL 4
  • 5. ArrayExpress is the 2nd largest resource for public transcriptomics data (CIBEX < AE < GEO) ‘blood cancer’ ‘hematological neoplasm’ ‘haematological neoplasm’ Archive ‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas ‘leukaemia’ ‘haematological cancer’ 25k exps 2.6k exps EFO 5
  • 6. Experimental Factor Ontology (EFO) • Modelling experimental factors currently in Archive: species, diseases, cell lines, etc. • Capture ~30% not in UMLS • Determined by Atlas, Ensembl, external requests (Upenn) and EBI site-wide search 6
  • 7. Developed a process to automatically import metadata from reference ontologies and validate changes 20000 SYNONYMS 18000 16000 Number of classes or synonyms 14000 12000 10000 8000 6000 CLASSES 4000 2000 0 Aug-08 Jan-09 Jan-10 Jan-11 Aug-11 Time 7
  • 8. Step 1: xrefs are acquired by fuzzy lexical matching to domain ontologies Disease Ontology acute lymphoblastic leukemia acute lymphocytic leukemia http://www.ebi.ac.uk/efo/EFO_0000220 DOID:9952 xref: NCI Thesaurus DOID:9952 xref: Acute Lymphoblastic Leukemia NCIt:C3167 C3167 map to EFO xref: DOID:9952 potential synonymy Perl mapping scripts xref: EBI::FGPT::FuzzyRecogniser NCIt:C3167 OWL::Simple::Parser 8
  • 9. Did not evaluate Norm in this context • Production requirements (Perl, OWL) • Improvement (ngrams) over legacy code • Primary use case mapping EFO against AE annotations: • 2'-deoxy-5-azacytidine to 5-aza-2'-deoxycytidine CHEBI:50131 • Barrett&#39;s Esophagus to Barrett's esophagus • Difficult to use MetaMap on non-UMLS ontologies 9
  • 10. Step 2: definitions and synonyms are pulled in from reference ontologies via NCBO BioPortal S acute lymphoblastic leukemia http://www.ebi.ac.uk/efo/EFO_0000220 T E xref: xref: SNOMEDCT:91857003 translate IDs P DOID:9952 xref: xref: NCIt:C3167 1 NCIt:C3167 synonym: Acute lymphoid leukaemia, disease definition: S Leukemia with an acute onset [...] fetch T bioportal_provenance: E Acute Lymphocytic Leukaemia [accessedResource: NCIt:C3167] P [accessDate: 05-04-2011] bioportal_provenance: 2 Leukemia with an acute onset [...] + provenance [accessedResource: NCIt:C3167] [accessDate: 05-04-2011] 10
  • 11. Step 3: regression testing package produces a report for manual verification of the import • 13 different tests • Shared xrefs, e.g. NCIt:C17459 (Hispanic or Latino) • Hispanic (EFO_0003169) • Latino (EFO_0003166) • Shared synonyms, e.g. head kidney (ZFA:0000669) • pronephros (EFO_0000927) • bone marrow (EFO_0000868) • Changes in external sources (11/2010 vs. 5/2010): • synonym Spinocerebellar Ataxias (EFO_0002624) no longer in DOID:1441 • definition Organ with organ cavity which connects the cavity of the urinary bladder to the exterior. […] (EFO_0000931) no longer in FMAID:1966 11
  • 12. EFO has a unique XSLT-based web presence http://www.ebi.ac.uk/efo/overview 12
  • 13. EFO URIs are readable by humans and computers 13
  • 14. Content negotiation is an alternative approach Tuckey’s server side urlrewritefilter <rule> <condition name="Accept" type="header"> application/rdf+xml</condition> <from>^/$</from> <to type="redirect">/efo/efo.owl</to> </rule> 14
  • 15. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries (W3C) If you want to put something on the web there are three rules: 1. All kinds of conceptual things, they have names now that start with HTTP. 2. If I take one of these HTTP names and I look it up [...] I fetch the data using the HTTP protocol from the web, I will get back some data in a standard format 3. It's got relationships [..] the other thing that it's related to is given one of those names that starts HTTP. So, I can go ahead and look that thing up. Sir Tim Berners-Lee on the next Web (TED2009) 15
  • 16. RDF triple is the core concept underpinning the semantic web subject predicate object <http://www.example.com/index.html> <http://purl.org/dc/elements/1.1/creator> „John Smith” dc:creator example:index.html John Smith Entity Attribute Value (EAV) model with well defined semantics 16
  • 17. Open linked data lacks central URI reconciliation • Responsibility for URIs: http://bio2rdf.org/mesh:68009154 http://bio2rdf.org/pubmed:11992264 http://bio2rdf.org/go:0016458 http://purl.org/obo/owl/GO#GO_0016458 • Versioning: http://sig.uw.edu/fma#Anatomical_entity (FMA 3.1) http://sig.biostr.washington.edu/fma3.0#Anatomical_entity (FMA 3.0) http://purl.obolibrary.org/obo/GO_0016458 (Foundry-compliant URI) • Requires institutional support • Would be great to have public UMLS in RDF 17
  • 18. Common Ontology Application Tasks (OntoCAT) ‘blood cancer’ ‘hematological neoplasm’ ‘haematological neoplasm’ Archive ‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas ‘leukaemia’ ‘haematological cancer’ 25k exps 2.6k exps EFO 18
  • 19. There is no single ontology resource that covers all the use cases Local ontologies in OWL/OBO NCBO BioPortal EBI Ontology Lookup Service ...and no huffing and puffing will blow all of them down... Leonard Leslie Brooke (1904) 19
  • 20. EBI Ontology Lookup Service • 82 ontologies • OBO ontologies • SOAP web services/Java client • First out there Cote RG, Jones P, Apweiler R, Hermjakob H.The ontology lookup service, a lightweight cross-platform tool for controlled vocabulary queries. BMC Bioinformatics. 2006 Feb 28;7(1):97 20
  • 21. NCBO BioPortal • 267 ontologies and growing • Both OWL and OBO • REST web services • Rich in functionality Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C., Rubin, D.L., Storey, M.A., Chute, C.G., Musen, M.A.BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 2009 Jul 1;37(Web Server issue):W170-3. 21
  • 22. OLS vs. BioPortal (July, 2010) 22
  • 23. OWL API • Reference implementation for manipulating and serialising OWL2 • Multiple parsers (incl. OBO) • Reasoner interfaces • Low level access Sean Bechhofer, Phillip Lord, Raphael Volz. Cooking the Semantic Web with the OWL API. 2nd International Semantic Web Conference, ISWC, Sanibel Island, Florida, October 2003 23
  • 24. We wanted to annotate data with ontology terms within the MOLGENIS framework – ontology browser OWL API EFO Bioportal Import Ontology Browser 24
  • 26. A simple facade to ontology resources providing a set of functions most common to ontology APIs (e.g. HL7 CTS2, UMLS API) under a single interface http://www.ontocat.org BioPortal searchAll() searchOntology() getChildren() EBI OLS getParents() getSynonyms() getDefinitions() OWL getAllParents() getAllChildren() getRelations() OBO ... ? 26
  • 27. There are many ways how you could use OntoCAT • Store data and annotate with ontology terms • OntoCAT database and browser • Work with ontologies in R • Bioconductor ontocat R package • Integrate a number of ontologies in a local repository • OntoCAT REST server • Add ontology support to your GWT web application • OntoCAT GoogleApp http://www.ontocat.org/wiki/OntocatDownload 27
  • 28. The curious case of OntoFox, OntoBee, and OntoCAT 28
  • 29. Developed for internal and external use cases Example 11@ontocat.org • Automatically obtain CUIs from UMLS sources for extracted terms via BioPortal • Shamim Mollah, Bleeding History Phenotype Ontology, Rockefeller University Center for Clinical and Translational Science, New York, NY 1. Get all terms from BHP 2. Search for corresponding UMLS terms (also MetaMap) 3. Obtain CUIs for mapped terms through BioPortal 29
  • 30. ontocat R is first on Google 30
  • 31. Use case – explore beyond subsumption Example 16@ontocat.org • Requested by reviewer for partonomy in GO • Easy in OBO, hard in OWL • Computationally intensive: • (starting from the root node) • 1. classify all children of inverse_relation some class • 2. repeat 1. on all new nodes • 3. finish if all nodes were seen • OWL API is not thread safe 31
  • 32. Reasoning is fundamental to exploring the hierarchies of more expressive ontologies Heart Heart Component Left Heart partOf is_a Mitral Valve 32
  • 33. When ontologies classify as inconsistent it is not often obvious why (Open World Assumption) • Mary is_a CitizenOfFrance Is Paul a citizen of France? Closed World, e.g. SQL databases: NO Ontologies: ? • OWL is more expressive: classes, individuals, closure axioms, value partitions, cardinality restrictions, property chains; disjoint, reflexive, irreflexive, symmetric and anti- symmetric, inverse or transitive properties • Explanation in OWL (http://owl.cs.manchester.ac.uk/explanation/) 33
  • 34. The extra information is used in QC of EFO, but not in query expansion ventricular subClassOf cardiomyopathy myocardium part_of has_disease_location myocardium atrial myocardium cardiac ventricle atrium heart atrial fibrillation Heart disease? 34
  • 35. Google analytics for ontocat.org 35
  • 36. OntoCAT is enterprise-grade, low-maintenance, headache-free, zero-configuration software • Java6, maven and ant support • Open source (LGPL v3) • 137 unit tests • Hudson daily builds Tests passed • Flexibility through design patterns: • Decorators • Proxies • Composites Daily builds 36
  • 37. Semantic Web Atlas of Gene Expression ‘blood cancer’ ‘hematological neoplasm’ ‘haematological neoplasm’ Archive ‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas ‘leukaemia’ ‘haematological cancer’ 25k exps 2.6k exps EFO 37
  • 38. EFO inferred is_a hierarchy defines how experiments are aggregated in Atlas for re-analysis http://www.ebi.ac.uk/gxa 38
  • 39. It is possible to infer diseases of heart computationally rather than asserting this information directly ventricular subClassOf cardiomyopathy myocardium part_of has_disease_location myocardium atrial myocardium cardiac ventricle atrium heart atrial fibrillation has_disease_location ∃ (heart ∪ part_of ∃ heart) heart disease ≡ 39
  • 40. One RDF graph per experiment accession Context-specific gene expression is grouped with blank nodes experiment accession Predicates Homo sapiens efo:EFO_0004033 rdf:type rdf:type liver organism OBI_0100026 E-AFMX-1 gxa:E-AFMX-1 is_about IAO_0000136 NONDE gene EFO_0002606 experimental factor EFO_0000001 1.0E30 discretized differential EFO_0004034 expression p value OBI_0000175 PRDX2 ensembl:ENSG00000167815 gene efo:EFO_0002606 W3C Note on RDF Approach to Gene Expression Data (in progress) 40 Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
  • 41. One RDF graph per experiment accession Context-specific gene expression is grouped with blank nodes experiment accession Predicates Homo sapiens efo:EFO_0004033 rdf:type rdf:type liver organism OBI_0100026 E-AFMX-1 gxa:E-AFMX-1 is_about IAO_0000136 NONDE gene EFO_0002606 experimental factor EFO_0000001 1.0E30 discretized differential EFO_0004034 expression p value OBI_0000175 approximately 14 PRDX2 weeks ensembl:ENSG00000167815 NONDE gene efo:EFO_0002606 1.0E30 W3C Note on RDF Approach to Gene Expression Data (in progress) 41 Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
  • 42. Sesame triplestore provided the shortest Time-to-Market REST + XSLT = RDF WAR RDF Jun Zhao @ Oxford tc-test-3 tomcat-7 tomcat-8 Load Balancer Jena TDB Milarq wwwdev www.ebi.ac.uk www.open- 42 biomed.org.uk
  • 43. Semantic Web is unlikely to take over the web, but has the potential to unify all of bioinformatics OntoCAT EFO Semantic Atlas http://gigaom.com/broadband/the-storage-vs-bandwidth-debate/ 43
  • 44. Acknowledgments • Morris A. Swertz’s group at the Genomics Coordination Center (GCC), University of Groningen This work was supported by the European • K Joeri van der Velde Community's Seventh Framework • Despoina Antonakaki Programmes GEN2PHEN [grant number • Dasha Zhernakova 200754], SLING [grant number 226073], and SYBARIS [grant number 242220], the • James Malone European Molecular Biology Laboratory, the • Helen Parkinson Netherlands Organisation for Scientific Research [NWO/Rubicon grant number • FuzzyRecogniser: Emma Hastings 825.09.008], and the Netherlands • Niran Abeygunawardena Bioinformatics Centre [BioAssist/Biobanking platform and BioRange grant SP1.2.3] • Ele Holloway • Tim Rayner OntoCAT logo courtesy of Eamonn Maguire • Zooma: Tony Burdett • Bioconductor/R package: Natalja Kurbatova, Pavel Kurnosov, Misha Kapushesky Special thanks go to NCBO BioPortal and EBI OLS support teams for all the comprehensive help they provide 44