The document discusses using ontologies to integrate systems biology data. It describes typical steps in systems biology studies such as finding studies, processing data, integrating data, and combining data from multiple sources. Ontologies can help link information from different analysis techniques and combine data from many studies by capturing study metadata. The document advocates using standards like ISA-TAB and MAGE-TAB to capture study data and proposes using a generic study capture framework with modular components to integrate different types of 'omics data. Ontologies are needed for collaboration and to provide controlled vocabularies for annotation.
Using ontologies to do integrative systems biology
1.
2. Using ontologies to do integrative
systems biology
Chris Evelo
Department of Bioinformatics - BiGCaT
Maastricht University
@Chris_Evelo
chris.evelo@maastrichtuniversity.nl
3. Typically we want to:
• Find studies.
• Process data.
• Integrate.
• Evaluate.
• Combine with yet
other data.
Faculty of Health, Medicine and Life Sciences
4. Systems Biology Issues:
• Environment
• Multi-compartment
• Different levels of gene expression cascade
(multi-omics)
Needs:
• Link information from different analysis
techniques
• Combine many studies (store study design)
Faculty of Health, Medicine and Life Sciences
5. Using ISA to
be able to
find studies
http://dx.doi.org/10.1038/ng.1054
Faculty of Health, Medicine and Life Sciences
6. Why a study capturing application?
New studies can be performed based on old data
Translational comparisons (mouse, human, rat etc)
Structured storage
Facilitate collaborations between groups
- Data sharing on joined project
- Start a collaboration
7. What do we need to accomplish this
Acceptance
- Using standards (e.g. ISA-TAB & MAGE-TAB)
- User friendly (interface via web browser)
- Open source
- Examples
Collaboration
- Ontologies
- Security of data (log-in and store data locally)
- Open source (make own module)
8. dbXP: a total study capturing solution
Simple assay module Metabolomics module
Web input Study capturing module Web output
Feature layer
Transcriptomics module Any new module
9. dbNP Architecture
GSCF Simple Assay module Query module
Body weight, BMI, etc.
Pathways, GO, metabolite profiles
Templates
Templates
Templates
Transcriptomics module Full-text querying
Clean data Result data
Raw data
Subjects Groups gene p-values
cell files Structured
expression z-values
querying
Events Protocols Profile-based analysis
Epigenetics module
Raw data Clean Resulting
Samples Assays Nimblegen CPG island Genome Study comparison
Illumina data Feature data
Web user interface
Faculty of Health, Medicine and Life Sciences
10. Generic Study Capture Framework
Data input / output
GSCF
Templates
Templates
Templates
Subjects Groups
xls, cvs, text
Data import
NCBO web
Events Protocols
Ontologies interface
Samples Assays
custom
custom
custom custom
custom
Molgenis programs
programs EBI custom
programs dbs
dbs
repository dbs
11.
12.
13. Used in European Projects
Food4me (Dublin)
NU-AGE (UNIBO, Bologna)
Bioclaims (UIB, Palma)
Nutritech (TNO, Zeist)
EuroDish (WUR, Wageningen)
ITFoM (proposed for metabolic syndrome studies)
15. Epigenetics DNA Methylation Pipeline
Raw data R
Nimblegen QC, processing Clean
DNA Result
Raw data R methylation data
Illumina QC, processing data Statistical with
(Genome analysis p-values
Feature (GFF)
Raw sequencing data Sequence Format)
MeDIP, BIS-Seq QC, processing
16. Connecting to Pathways:
1) Prepare data for pathway analysis
2) Connect processing pipelines
PathVisioRPC used from arrayanalysis.org
see: http://pathvisiorpc.wordpress.com
3) Store Pathway profiles as vectors,
Using pathways themselves as a vocabulary
C Evelo, K van Bochove & J Saito. Genes Nutr (2011) 6: 81-87Answering
biological questions - querying a systems biology database for
nutrigenomics
4) Allow queries for studies with same outcome
Faculty of Health, Medicine and Life Sciences
17. Integrate
Example
WikiPathway Pathway
Pathway on glycolysis.
Using modern systems
iology annotation.
And genes and
metabolites connected
to major databases.
Faculty of Health, Medicine and Life Sciences
20. If the mountain will not
come to Mahomet,
Mahomet must go to
the mountain.
Other repositories (like
dbXP!) have better
study descriptions.
Integrate in Sage
Synapse.
Pathway visualisation
missing: integrate
PathVisio in Synapse
(started).
Faculty of Health, Medicine and Life Sciences
21. PathVisio
www.pathvisio.org
• Data modeling and visualization on biological pathways
• Uses gene expression, proteomics and metabolomics data
• Can identify significantly changed processes
Martijn P van Iersel, Thomas Kelder, Alexander R Pico, Kristina Hanspers, Susan Coort, Bruce R Conklin, Chris
Evelo (2008) Presenting and exploring biological pathways with PathVisio. BMC Bioinformatics 9: 399
22. Understanding
genomics
Example
WikiPathways Pathway
Pathway on glycolysis.
Using modern systems
biology (MIM) annotation.
And genes and metabolites
connected to major
databases.
Faculty of Health, Medicine and Life Sciences
24. adding data =
adding colour
Example
PathVisio result
Showing proteomics
and transcriptomics
results on the glycolysis
pathway in mice liver
after starvation.
[Data from Kaatje
Lenaerts and Milka
Sokolovic, analysis by
Martijn van Iersel]
Faculty of Health, Medicine and Life Sciences
31. Solution: Built-in Mapping
• Generic
bioinformatics
platforms should
have identifier
mapping built-in.
BioConductor
PathVisio
Cytoscape
...
Batteries
Included
32. Problem: Which mapping service?
• Ensembl Biomart
• Synergizer
• CRONOS
• DAVID
• AliasServer
• MatchMiner
• OntoTranslate
or
• Local database
33. BridgeDB: Abstraction Layer
class
IDMapperRdb
relational database
interface
IDMapper class
IDMapperFile
tab-delimited text
class
IDMapperBiomart
web service
The BridgeDb Framework: Standardized Access to Gene, Protein and Metabolite Identifier
Mapping Services. Martijn P van Iersel, Alexander R Pico, Thomas Kelder, Jianjiong Gao, Isaac Ho,
Kristina Hanspers, Bruce R Conklin, Chris T Evelo. BMC Bioinformatics 2010, 11: 5.
34. CyThe- Network
saurus Merge Wiki
Tools PathVisio
Pathways
Cytoscape Plugins
BridgeDb
Internet webservices
Local Tab-
Mapping
BridgeDb Databas delimited
Services BioMart PICR - e text files
REST
54. PathVisio RI plugin provides backpage info
microRNAs in pathway analysis. The Regulatory Interaction plugin offers a suitable middle-ground between not including any
miRNAs in pathways, which misses this regulatory information, and including all validated miRNA-target interactions, which
clutters the pathway. After loading interaction file(s), selecting a pathway element shows the interaction partners of this
element and their expressions in a side panel. This allows for the detection of potential active regulatory mechanisms in the
study at hand.
http://www.bigcat.unimaas.nl/wiki/images/f/f6/VanHelden-poster-nbic2012.pdf
55. Or consider pathway as a network
Faculty of Health, Medicine and Life Sciences
57. Cytoscape visualization used to group
PPS1
Liver
All pathways
Pathways with high z-score
grouped together.
Explains why there are
relatively few significant
genes, but many pathways
with high z-score.
Robert Caesar et al (2010) A combined transcriptomics and lipidomics analysis of subcutaneous,
epididymal and mesenteric adipose tissue reveals marked functional differences. PLoS One 5: 7. e11525
http://dx.doi.org/doi:10.1371/journal.pone.0011525
58. Explore pathway interactions
Thomas Kelder, Lars Eijssen, Robert Kleemann, Marjan van Erk, Teake Kooistra, Chris Evelo
(2011) Exploring pathway interactions in insulin resistant mouse liver BMC Systems Biology 5: 127
Aug. http://dx.doi.org/doi:10.1186/1752-0509-5-127
59. What we used
Non-redundant shortest paths in a weighted
graph.
1. A set of pathways
2. An interaction network
3. Weight value for all edges
= experimental expression of connected
genes.
61. An indirect interaction between the Axon Guidance and Insulin Signaling pathways in the network for
the comparison between HF and LF diet at t = 0. Left: Network representation of the identified path
between the two pathways, consisting of three proteins Gsk3b, Sgk3 and Tsc1. Right: The location of these
proteins in the KEGG pathway diagrams. The newly found indirect interactions have been added in red.
62. Pathway interactions and
detailed network visualization
for the interactions with three
apoptosis related pathways for
the comparison between HF and
LF diet at t = 0. A: Subgraph of the
pathway interaction network, based
on incoming interactions to three
stress response and apoptosis
pathways with the highest in-
degree. Pathway nodes with a thick
border are significantly enriched (p
< 0.05) with differentially expressed
genes. B: The protein interactions
that compose the interactions
between the three apoptosis
related pathways and their
neighbors in the subgraph as
shown in box A (see inset, included
interactions are colored orange).
Protein nodes have a thick border
when their encoding genes are
significantly differentially expressed
(q < 0.05).
63. We tried to make it easier with
The CyTargetLinker Cytoscape Plugin
Extending pathways on the fly.
Provided databases with the plugin:
• miRNAs with targets
• Transciption Factors with targets
• Drug – Target Interactions
• ENCODE derived databases
Extend with your own.
65. miRTarBase as a target interaction network
Collection of miRNA-target gene interactions in the miRTarBase database with 1,715 genes,
286 miRNAs and 2,817 interactions.
70. OPS Framework
OPS GUI Architecture. Dec 2011
App
Framework
Web Service API Sparql Web
Services
OPS Data Model
Identity &
Vocabulary
Management Semantic Data Workflow Engine
RDF Data Cache
Chemistry
Normalisation &
Registration Descriptor Descriptor
Descriptor Descriptor Nanopub Nanopub
Feed in WikiPathways
RDF 1
relationships, use BioPAX RDF 2 RDF 3 RDF 4
to create the RDF
Public
Vocabularies Data 1 Data 2 Data 3 Data 4
72. Well yes, for Open PHACTS we do…
OPS Data Model
Identity &
Vocabulary
Management Semantic Data Workflow Engine
Chemistry
Normalisation &
Registration
Descriptor Descriptor
RDF 1 RDF 2
Public
Vocabularies Data 1 Data 2
73. But really…,
what about federated SPARQL queries?
Descriptor Descriptor
RDF 1 RDF 2
Other
Public
Vocabularies Data 1 Data 2 Public
Vocabularies
74. Most often partly…
If the vocabularies used are different linking just database IDs not good enough.
We need full mappings of ontologies.
Identification of overlapping modules.
And maybe… Suggestions for ontologies to use in specific field.
Identity
Mapping
Descriptor Descriptor
RDF 1 RDF 2
Other
Public
Vocabularies Data 1 Data 2 Public
Vocabularies
75. Thanks!
WikiPathways team:
• Martijn van Iersel (PathVisio,
BridgeDB)
• Thomas Kelder (WikiPathways,
networks)
• Alex Pico (US team leader)
• Brice Conklin (former US team leader)
• Kristina Hanspers (US curation)
• Martina Kutmon (CyTargetLinker)
• Susan Coort (Regulatory plugins)
• Lars Eijssen (Data pipelines)
• Anwesha Dutta (Flux visualisation)
• Andra Waagmeester (LOOM)
• Egon Willighagen (Open Phacts)
Funding. Dutch: IOP, NBIC, NuGO, NCSB. Regional:
Transnational University. EU: NuGO and Microgennet,
IMI: Open Phacts + Agilent thought leader grant and
NIH.
76. Thanks!
Funding. Dutch: IOP, NBIC, NuGO, NCSB. Regional:
Transnational University. EU: NuGO and Microgennet,
IMI: Open Phacts + Agilent thought leader grant.
77. Analyzing GO representation in
pathways using an independent
library for ontology analysis
Combining efforts and information to
increase biological understanding
78. Structuring biological data
• Gene Ontology (GO)
– Protein function or
localization
– Hierarchically structured
terms
– 3 topics (namespaces)
• Biological process
• Molecular function
• Cellular component
– Disadvantage
• No information on interactions
79. Structuring biological data
• Pathways
– Network of interactions
– Structural overview of elements in the
pathway
– Disadvantages:
• Missing structure
of interacting
pathways
• Overlap and
abundance in
pathways
80. Analysis based on structures
• Uses:
– Better overview of the data
– Increased biological understanding
• Challenges in the field:
– Difficulty comparing algorithms
– Good work may be overlooked
– Redundant efforts
– Out-of-date algorithms used
– Comparison extremely difficult
81. Goals:
• Develop an independent library for ontology
analysis in which efforts can be combined
• Increase biological understanding by
combining knowledge on pathways and gene
ontology.
82. Independent library for ontology
analysis
• Open source:
– Collaboration
– Clear view of the algorithm
– Free use
– Minimalizing redundant efforts
• Usable for multiple ontology's and identifiers
83. Combining Pathways and GO
• Display information on the function of the
pathway
• Make a comparison between pathways
• Quality control
– Single pathway
– List of pathways
84. Materials
• PathVisio
– Open source Tool for visualizing and analyzing
pathway data
• BridgeDb
– id mapping framework for bioinformatics
• WikiPathways
– Community curated pathway data source
86. Methods
Id’s linked Genes not
to GO linked to GO
Id’s in
pathway a b a+b
Id’s not in
pathway c d c+d
a+c b+d n
87. Plug-in
• Panel for the analysis of a single pathway
– Display GO terms in a table with score
– Highlight matches
– Save results
• Menu Item for analyzing a list of pathways
– Select a folder containing pathway files
– Individual result files
– File containing all results with extra info
89. Single Pathway analysis
• Regulation of blood pressure
• Angiogenesis
• Others:
– G-protein coupled receptor
– proteolysis
Homo sapiens: Mus musculus:
name score name score
G-protein coupled receptor signaling kidney development 50%
pathway 35% G-protein coupled receptor signaling
regulation of cell proliferation 29% pathway 50%
proteolysis 29% response to drug 37%
regulation of blood pressure 29% negative regulation of cell proliferation 37%
response to drug 29% positive regulation of apoptotic process 37%
regulation of vasoconstriction 29% regulation of blood pressure 37%
positive regulation of apoptotic process 29% response to salt stress 25%
negative regulation of cell growth 23% regulation of systemic arterial blood
kidney development 23% pressure by circulatory renin-angiotensin 25%
elevation of cytosolic calcium ion arachidonic acid secretion 25%
concentration 23% blood vessel development 25%
91. Multiple Pathway analysis
0 2 4 6 8 10 12 14 16 18
Biological Process
12 of 105 terms signal transduction
xenobiotic metabolic process
oxidation-reduction process
metabolic process
G-protein coupled receptor signaling pathway
gene expression
nerve growth factor receptor signaling pathway
apoptotic process
synaptic transmission
DNA repair
mitotic cell cycle
innate immune response
0 10 20 30 40 50 60 70 80
Cellular Compontent
cytoplasm
12 of 26 terms cytosol
nucleus
plasma membrane
membrane
integral to membrane
mitochondrion
nucleoplasm
endoplasmic reticulum membrane
extracellular region
endoplasmic reticulum
integral to plasma membrane
microsome
extracellular space
92. Goals:
• Develop an independent library for ontology
analysis in which efforts can be combined
• Increase biological understanding by
combining knowledge on pathways and gene
ontology.
93. Independent library
• Reads GO terms from file
• Mapping from term to identifier
• Analysis on sample data
• Framework enables more methods to be
added
94. Combining Pathways and GO
• Single Pathway:
– More information on pathway
– Quality control possible
• Pathway List:
– Separate results for every pathway
– Enables structuring possibility’s
– Quality control possible
Notas del editor
The home page for this webinar is http://www.bioontology.org/ontologies-in-integrative-systems-biology. There will be a recording of the webinar on that page.
The slides labeled TNO and the dbNP/dbXP screen shots curtousy of JildauBouman
A closer look at the same pathway.Note that this uses MIM notation from the MIM PathVisio plugin.In general the connections between different genes and metabolites describe the network underlying the pathway. Note that this is already quite complex since there are different ways to show what interacts with what.Graphical methods to capture this like MIM and SBGN definitely help. The result can be captures in descriptive relationships in BioPax,
As soon as you have entered one (and only one) identifier to describe what gene product or metabolite you really mean this information is linked to many other identifiers from other databases and links to these respective pages are shown in the so called “backpage” (actually one of the pages under the tabs at the righthand side of the pathway).
BridgeDB development lead by Martijn van Iersel.
BridgeDB (see www.bridgedb.org and the paper mentioned on the slide) provides the mechanism needed for that identifier mapping.
Note that BridgeDB now also is part of the Indentifier Mapping service of Open PHACTS.
Showing the concept. Integrating flux predictions from modelling (of course that could also be real fluxomics data)
Probably not an iPAD, those microarrays were at least 10 years old.
Introducing a problem
And a solution that isn’t really a solution. There are just too many things you could add.
There are just too many SNPs for any given gene.
And a solution that isn’t really a solution. There are just too many things you could add.
The PathVisio Regulatory Interaction plugin (author Stefan van Helden) has a new approach where information is not really added to a pathway, but shown in a separate page upon request.
Probably not an iPAD, those microarrays were at least 10 years old.
The approach takes into account all data use (pathways, interactions and experimentally determined weight). Check out the original paper for details.
Example result. Pathways with stronger interaction based on gene snot present in them.
And you can do the same for relatively large sets of pathways “driving” a process like apoptosis.
CyTargetLinker is a Cytoscape plugin that can be used to extend one network with information about things targeting entities in that network from databases that are created as a network. It already provides a number of target relation databases as mentioned on the slide.
Example of a target network. (You will normally see this, it contains the information that is used to extend your source network).
You can drive it from a gene set, that isn’t even a network at the start. But when miRNAs are found to target more than one gene in the ggroup the network is created on the fly.
Or you can bootstrap the approach from an existing network. Which can be a pathway based one imported with the GPML plugin like shown here.
Adapted by Nadia and Martijn from General Bioinformatics
An overview of the Open Phacts project that pulls in lots of information in a semantic web triple store (including information from WikiPathways RDF) and then provides that for use in other tools. In WikiPathways we use that to suggest possible pathway extensions to curators
Many people involved in this work. (Really many if you count associated groups like the plugin developers, pathway curators etc).Most importantSF group (Kristina Hanspers, Bruce Conklin and Alex Pico) collaborating on many things but primarily WikiPatwhaysMartijn van Iersel top left (PathVisio, BridgeDB). Thomas Kelder (top middle) (WikiPathways including webservices, pathway integration networks for nutrigenomics), Martina Kutmon (top right) (CyTargetLinker, PathVisio further development), Andra Waagmeester (second row, right) (WikiPathways RDF), Anwesha Dutta (bottom, 2nd from the left) (flux visualization), Stefan van Helden (not on the picture) for the RI PathVisio plugin
Many people involved in this work. (Really many if you count associated groups like the plugin developers, pathway curators etc).Most importantSF group (Kristina Hanspers, Bruce Conklin and Alex Pico) collaborating on many things but primarily WikiPatwhaysMartijn van Iersel top left (PathVisio, BridgeDB). Thomas Kelder (top middle) (WikiPathways including webservices, pathway integration networks for nutrigenomics), Martina Kutmon (top right) (CyTargetLinker, PathVisio further development), Andra Waagmeester (second row, right) (WikiPathways RDF), Anwesha Dutta (bottom, 2nd from the left) (flux visualization), Stefan van Helden (not on the picture) for the RI PathVisio plugin
These last slides were not presented during the webinar. They are the result of a masters student project by Christ Leemans supervised by Martina Kutmon