The document provides information about the Experimental Factor Ontology (EFO). EFO models experimental factors from genomic studies stored in the ArrayExpress archive, including species, diseases, and cell lines. It captures about 30% of terms not already in the UMLS. EFO uses reference ontologies and automatic mapping to import synonyms and definitions. Regression testing verifies ontology changes. EFO has a web interface and content negotiation support, and defines experimental factor hierarchies used in the Gene Expression Atlas to aggregate experiments.
Measures of Dispersion and Variability: Range, QD, AD and SD
Unifying ontology services for functional genomic annotations
1. Unifying ontology services for
functional genomic annotations
Tomasz Adamusiak MD PhD
7omasz
Postdoc at LHC CgSB since 10/2011
1
EBI is an Outstation of the European Molecular Biology Laboratory.
2. The European Molecular Biology Laboratory, a
“European NIH” for molecular biology
Heidelberg Hamburg Hinxton
Basic research in Structural biology Bioinformatics
molecular biology
Administration Grenoble Monterotondo
EMBO
• 1500 staff
• >60 nationalities
Structural biology Mouse biology
2
2
4. Focus on providing database services to bioinformatics
community Literature and ontologies
CiteXplore, GO
Genomes
Ensembl
Ensembl Genomes Protein families,
EGA motifs and domains
Functional InterPro
Nucleotide sequence genomics
ENA ArrayExpress
Expression Atlas Macromolecular
EFO PDBe
Protein activity
IntAct , PRIDE
Pathways
Reactome
Protein Sequences
UniProt
Chemical entities Systems
ChEBI BioModels
BioSamples
Chemogenomics
ChEMBL 4
5. ArrayExpress is the 2nd largest resource for public
transcriptomics data (CIBEX < AE < GEO)
‘blood cancer’
‘hematological neoplasm’
‘haematological neoplasm’
Archive
‘lymphoma/leukemia’ EFO: lymphoid neoplasm Atlas
‘leukaemia’
‘haematological cancer’
25k exps 2.6k exps
EFO
5
6. Experimental Factor Ontology (EFO)
• Modelling experimental factors currently in Archive:
species, diseases, cell lines, etc.
• Capture ~30% not in UMLS
• Determined by Atlas, Ensembl, external requests (Upenn)
and EBI site-wide search
6
7. Developed a process to automatically import metadata
from reference ontologies and validate changes
20000
SYNONYMS
18000
16000
Number of classes or synonyms
14000
12000
10000
8000
6000 CLASSES
4000
2000
0
Aug-08 Jan-09 Jan-10 Jan-11 Aug-11
Time
7
9. Did not evaluate Norm in this context
• Production requirements (Perl, OWL)
• Improvement (ngrams) over legacy code
• Primary use case mapping EFO against AE annotations:
• 2'-deoxy-5-azacytidine to 5-aza-2'-deoxycytidine CHEBI:50131
• Barrett's Esophagus to Barrett's esophagus
• Difficult to use MetaMap on non-UMLS ontologies
9
10. Step 2: definitions and synonyms are pulled in
from reference ontologies via NCBO BioPortal
S acute lymphoblastic leukemia
http://www.ebi.ac.uk/efo/EFO_0000220
T
E xref:
xref:
SNOMEDCT:91857003 translate IDs
P DOID:9952
xref:
xref:
NCIt:C3167
1 NCIt:C3167
synonym:
Acute lymphoid leukaemia, disease
definition:
S Leukemia with an acute onset [...] fetch
T bioportal_provenance:
E Acute Lymphocytic Leukaemia
[accessedResource: NCIt:C3167]
P [accessDate: 05-04-2011]
bioportal_provenance:
2 Leukemia with an acute onset [...]
+ provenance
[accessedResource: NCIt:C3167]
[accessDate: 05-04-2011]
10
11. Step 3: regression testing package produces a
report for manual verification of the import
• 13 different tests
• Shared xrefs, e.g. NCIt:C17459 (Hispanic or Latino)
• Hispanic (EFO_0003169)
• Latino (EFO_0003166)
• Shared synonyms, e.g. head kidney (ZFA:0000669)
• pronephros (EFO_0000927)
• bone marrow (EFO_0000868)
• Changes in external sources (11/2010 vs. 5/2010):
• synonym Spinocerebellar Ataxias (EFO_0002624) no longer in
DOID:1441
• definition Organ with organ cavity which connects the cavity of the
urinary bladder to the exterior. […] (EFO_0000931) no longer in
FMAID:1966
11
12. EFO has a unique XSLT-based web presence
http://www.ebi.ac.uk/efo/overview
12
13. EFO URIs are readable by humans and computers
13
14. Content negotiation is an alternative approach
Tuckey’s server side urlrewritefilter
<rule>
<condition name="Accept" type="header">
application/rdf+xml</condition>
<from>^/$</from>
<to type="redirect">/efo/efo.owl</to>
</rule>
14
15. The Semantic Web provides a common framework that allows data to
be shared and reused across application, enterprise, and community
boundaries (W3C)
If you want to put something on the web there are three rules:
1. All kinds of conceptual things, they have names now that start with
HTTP.
2. If I take one of these HTTP names and I look it up [...] I fetch the
data using the HTTP protocol from the web, I will get back some
data in a standard format
3. It's got relationships [..] the other thing that it's related to is given
one of those names that starts HTTP. So, I can go ahead and look
that thing up.
Sir Tim Berners-Lee on the next Web (TED2009)
15
16. RDF triple is the core concept underpinning the
semantic web
subject predicate object
<http://www.example.com/index.html> <http://purl.org/dc/elements/1.1/creator> „John Smith”
dc:creator
example:index.html John Smith
Entity Attribute Value (EAV) model with well defined semantics
16
17. Open linked data lacks central URI reconciliation
• Responsibility for URIs:
http://bio2rdf.org/mesh:68009154
http://bio2rdf.org/pubmed:11992264
http://bio2rdf.org/go:0016458
http://purl.org/obo/owl/GO#GO_0016458
• Versioning:
http://sig.uw.edu/fma#Anatomical_entity (FMA 3.1)
http://sig.biostr.washington.edu/fma3.0#Anatomical_entity (FMA 3.0)
http://purl.obolibrary.org/obo/GO_0016458 (Foundry-compliant URI)
• Requires institutional support
• Would be great to have public UMLS in RDF
17
19. There is no single ontology resource that covers
all the use cases
Local ontologies in OWL/OBO
NCBO BioPortal
EBI Ontology Lookup Service
...and no huffing and puffing
will blow all of them down...
Leonard Leslie Brooke (1904) 19
20. EBI Ontology Lookup Service
• 82 ontologies
• OBO ontologies
• SOAP web services/Java client
• First out there
Cote RG, Jones P, Apweiler R, Hermjakob H.The ontology lookup service, a
lightweight cross-platform tool for controlled vocabulary queries.
BMC Bioinformatics. 2006 Feb 28;7(1):97
20
21. NCBO BioPortal
• 267 ontologies and growing
• Both OWL and OBO
• REST web services
• Rich in functionality
Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C.,
Rubin, D.L., Storey, M.A., Chute, C.G., Musen, M.A.BioPortal: ontologies and
integrated data resources at the click of a mouse. Nucleic Acids Res. 2009 Jul
1;37(Web Server issue):W170-3. 21
23. OWL API
• Reference implementation for manipulating and
serialising OWL2
• Multiple parsers (incl. OBO)
• Reasoner interfaces
• Low level access
Sean Bechhofer, Phillip Lord, Raphael Volz. Cooking the Semantic Web with the
OWL API. 2nd International Semantic Web Conference, ISWC, Sanibel Island,
Florida, October 2003
23
24. We wanted to annotate data with ontology terms within the
MOLGENIS framework – ontology browser
OWL API
EFO Bioportal Import
Ontology Browser
24
26. A simple facade to ontology resources providing a set of
functions most common to ontology APIs (e.g. HL7 CTS2,
UMLS API) under a single interface
http://www.ontocat.org
BioPortal
searchAll()
searchOntology()
getChildren()
EBI OLS
getParents()
getSynonyms()
getDefinitions()
OWL getAllParents()
getAllChildren()
getRelations()
OBO ...
?
26
27. There are many ways how you could use
OntoCAT
• Store data and annotate with ontology terms
• OntoCAT database and browser
• Work with ontologies in R
• Bioconductor ontocat R package
• Integrate a number of ontologies in a local repository
• OntoCAT REST server
• Add ontology support to your GWT web application
• OntoCAT GoogleApp
http://www.ontocat.org/wiki/OntocatDownload
27
29. Developed for internal and external use cases
Example 11@ontocat.org
• Automatically obtain CUIs from UMLS sources for
extracted terms via BioPortal
• Shamim Mollah, Bleeding History Phenotype Ontology,
Rockefeller University Center for Clinical and
Translational Science, New York, NY
1. Get all terms from BHP
2. Search for corresponding UMLS terms (also MetaMap)
3. Obtain CUIs for mapped terms through BioPortal
29
31. Use case – explore beyond subsumption
Example 16@ontocat.org
• Requested by reviewer for partonomy in GO
• Easy in OBO, hard in OWL
• Computationally intensive:
• (starting from the root node)
• 1. classify all children of inverse_relation some class
• 2. repeat 1. on all new nodes
• 3. finish if all nodes were seen
• OWL API is not thread safe
31
32. Reasoning is fundamental to exploring the
hierarchies of more expressive ontologies
Heart
Heart
Component
Left
Heart
partOf
is_a
Mitral
Valve
32
33. When ontologies classify as inconsistent it is not often
obvious why (Open World Assumption)
• Mary is_a CitizenOfFrance
Is Paul a citizen of France?
Closed World, e.g. SQL databases: NO
Ontologies: ?
• OWL is more expressive: classes, individuals, closure
axioms, value partitions, cardinality restrictions, property
chains; disjoint, reflexive, irreflexive, symmetric and anti-
symmetric, inverse or transitive properties
• Explanation in OWL
(http://owl.cs.manchester.ac.uk/explanation/)
33
34. The extra information is used in QC of EFO, but
not in query expansion
ventricular subClassOf
cardiomyopathy
myocardium part_of
has_disease_location
myocardium
atrial myocardium
cardiac ventricle
atrium
heart atrial fibrillation
Heart disease? 34
38. EFO inferred is_a hierarchy defines how experiments are
aggregated in Atlas for re-analysis
http://www.ebi.ac.uk/gxa
38
39. It is possible to infer diseases of heart computationally
rather than asserting this information directly
ventricular subClassOf
cardiomyopathy
myocardium part_of
has_disease_location
myocardium
atrial myocardium
cardiac ventricle
atrium
heart atrial fibrillation
has_disease_location ∃ (heart ∪ part_of ∃ heart)
heart disease ≡
39
40. One RDF graph per experiment accession
Context-specific gene expression is grouped with blank nodes
experiment accession Predicates
Homo sapiens
efo:EFO_0004033
rdf:type rdf:type
liver
organism OBI_0100026
E-AFMX-1
gxa:E-AFMX-1 is_about IAO_0000136
NONDE
gene EFO_0002606
experimental factor EFO_0000001
1.0E30
discretized differential
EFO_0004034
expression
p value OBI_0000175
PRDX2
ensembl:ENSG00000167815
gene
efo:EFO_0002606
W3C Note on RDF Approach to Gene Expression Data (in progress)
40
Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
41. One RDF graph per experiment accession
Context-specific gene expression is grouped with blank nodes
experiment accession Predicates
Homo sapiens
efo:EFO_0004033
rdf:type rdf:type
liver
organism OBI_0100026
E-AFMX-1
gxa:E-AFMX-1 is_about IAO_0000136
NONDE
gene EFO_0002606
experimental factor EFO_0000001
1.0E30
discretized differential
EFO_0004034
expression
p value OBI_0000175
approximately 14 PRDX2
weeks ensembl:ENSG00000167815
NONDE
gene
efo:EFO_0002606
1.0E30
W3C Note on RDF Approach to Gene Expression Data (in progress)
41
Semantic Web for Health Care and Life Sciences Interest Group, BioRDF task force
43. Semantic Web is unlikely to take over the web,
but has the potential to unify all of bioinformatics
OntoCAT
EFO
Semantic
Atlas
http://gigaom.com/broadband/the-storage-vs-bandwidth-debate/ 43
44. Acknowledgments
• Morris A. Swertz’s group at the Genomics Coordination Center (GCC),
University of Groningen
This work was supported by the European
• K Joeri van der Velde
Community's Seventh Framework
• Despoina Antonakaki Programmes GEN2PHEN [grant number
• Dasha Zhernakova 200754], SLING [grant number 226073], and
SYBARIS [grant number 242220], the
• James Malone European Molecular Biology Laboratory, the
• Helen Parkinson Netherlands Organisation for Scientific
Research [NWO/Rubicon grant number
• FuzzyRecogniser: Emma Hastings 825.09.008], and the Netherlands
• Niran Abeygunawardena Bioinformatics Centre [BioAssist/Biobanking
platform and BioRange grant SP1.2.3]
• Ele Holloway
• Tim Rayner OntoCAT logo courtesy of Eamonn Maguire
• Zooma: Tony Burdett
• Bioconductor/R package: Natalja Kurbatova, Pavel Kurnosov, Misha
Kapushesky
Special thanks go to NCBO BioPortal and
EBI OLS support teams for all the
comprehensive help they provide
44