Talk at the SIG M3 meeting (ISMB 2009), Stockholm June 2009
Describes an approach for the functional classification of environmental sequences of a metagenomic data set.
http://www-ab.informatik.uni-tuebingen.de/software/megan/welcome.html
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Functional Metagenome Analysis using Gene Ontology (MEGAN 4)
1. Functional Classification of
Environmental Reads using Gene Ontology
Daniel C. Richter
Daniel H. Huson
Dept. Algorithms in Bioinformatics
ZBIT Center for Bioinformatics
University of Tuebingen, Germany
www-ab.informatik.uni-tuebingen.de
2. Metagenomics - Workflow
Environmental Sample
Sequencing (Sanger/NGS)
Who is out there? How many are there? What are they doing?
Taxonomical Analysis Quantitive Analysis Functional Analysis
MEGAN
Software
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [01]
3. Metagenomics - Workflow
Environmental Sample
Sequencing (Sanger/NGS)
Who is out there? How many are there? What are they doing?
Taxonomical Analysis Quantitive Analysis Functional Analysis
MEGAN
Software
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [01]
4. MEGAN – Taxonomical Analysis
Precomputation
Reads
nr
BLAST nt
...
„Laptop
MEGAN Analysis“
NCBI Taxonomy
• >460.000 taxa
• Taxonomical Ranks:
Kingdom, Phylum, Class,
Order,..., Species
Huson et al., 2007, Genome Research
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [02]
5. Functional Metagenome Analysis
Extension of MEGAN to classify reads according to their function
• Input: BLASTX result file → homology-based approach
• Structured and interactive overview of gene products
http://www.geneontology.org
widely used in biological databases, gene expression
and annotation studies
>27000 GO terms (cross-specific)
DAG
three structured vocabularies (ontologies)
molecular function
biological process
cellular component
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [03]
6. Mapping BLAST Matches to GO Terms
>gb|EAU86868.1| predicted protein [Coprinopsis cinerea okayama7#130]
>emb|CAC86119.1| putative hexose-6-phosphate transporter [Listeria monocytogenes]
>ref|ZP_00390013.1| Arabinose efflux permease [Bacillus anthracis str. A2012]
ref2go map
RefSeqID →
UniProt mapping
GO Terms
RefSeqID → GO Terms
RefSeqID → GO Terms http://pir.georgetown.edu/
RefSeqID → GO Terms
...
>3.5 Mio entries
GO:0044408
GO:0043581
GO:0032502
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [04]
7. Placing Reads onto GO Terms – LCA Approach
r
ar al la t
ul ic l u en
ec tion og ess e l on
C p
BLAST ref2go map ol ol
M un c
F
Bi roc
P Co
m GO Terms
M0 protein binding
M1 response to stress
signal transduction
Read M2
cell communication
M3
nucleus
M4 cell part
cytosol
Placement: ? ? ?
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [05]
8. Placing Reads onto GO Terms – LCA Approach
r
ar al la t
ul ic l u en
ec tion og ess e l on
C p
BLAST ref2go map ol ol
M un c Bi roc
P Co
m GO Terms
F
M0 protein binding
M1 response to stress
signal transduction
Read M2
cell communication
M3
nucleus
M4 cell part
cytosol
Placement: ? ?
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [06]
9. Placing Reads onto GO Terms – LCA Approach
r
ar al la t
ul ic l u en
ec tion og ess e l on
C p
BLAST ref2go map ol ol
M un c Bi roc
P Co
m GO Terms
F
M0 protein binding
M1 response to stress
signal transduction
Read M2
cell communication
M3
nucleus
M4 cell part
cytosol
Placement: ? ?
root root root
cellular
process
cell
communication
response signal response signal response signal
to stress transduction to stress transduction to stress transduction
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [07]
10. Placing Reads onto GO Terms – LCA Approach
r
ar al la t
ul ic l u en
ec tion og ess e l on
C p
BLAST ref2go map ol ol
M un c Bi roc
P Co
m GO Terms
F
M0 protein binding
M1 response to stress
signal transduction
Read M2
cell communication
M3
nucleus
M4 cell part
cytosol
Placement: ?
root root root
cellular
process
cell
communication
response signal response signal response signal
to stress transduction to stress transduction to stress transduction
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [08]
11. Placing Reads onto GO Terms – LCA Approach
r
ar al la t
ul ic l u en
ec tion og ess e l on
C p
BLAST ref2go map ol ol
M un c Bi roc
P Co
m GO Terms
F
M0 protein binding
M1 response to stress
signal transduction
Read M2
cell communication
M3
nucleus
M4 cell part
cytosol
Placement:
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [09]
12. Benefits and Drawbacks of the LCA Algorithm
• loss of accuracy: LCA is always less specific
• might miss gene products of interest (losing the „big picture“)
• reads with many different BLAST matches (= many GO terms)
are likely to be assigned to high level GO terms
• complexity reduction facilitates analysis and visual inspection
• memory efficient:
• need to store only three integers (GO IDs) per read
• applicable to large data sets: 5 Mio reads, 760 GB BLAST output
• loss of accuracy ≠loss of correctness (avoids false-positives)
→ balance between usability and accuracy
Calculation example „Full Approach“:
1,000,000 reads
each read: 50 BLAST matches
each match: 10 GO terms
→ 500,000,000 GO IDs
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [10]
13. GO Analyzer – Main Window
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [11]
14. GO Analyzer – Main Window
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [12]
15. GO Analyzer – Main Window
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [13]
16. GO Analyzer – Main Window
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [14]
17. GO Analyzer – Main Window
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [15]
18. GO Analyzer – Main Window
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [16]
19. GO Analyzer – Main Window
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [17]
20. GO Analyzer – Main Window
Extract reads
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [18]
21. GO Analyzer – Path Highlighting
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [19]
22. GO Analyzer – GO Slims
Gene Ontology provides subsets of GO terms
→ useful for high level view of the three ontologies
http://www.geneontology.org/GO.slims.shtml
Design your own metagenomic GO slim...
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [20]
23. GO Analyzer – Comparison View
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [21]
24. GO Analyzer – Comparison View
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [22]
25. GO Analyzer – Comparison View
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [23]
26. GO Analyzer – Comparison View
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [24]
27. GO Analyzer – Summary
• New module of MEGAN 4 to conduct functional analyses on
environmental reads
„BLAST only once, perform taxonomical and functional analysis in one step“
• Homology-based approach
• Overview tool: visual and interactive exploration of gene products
• Inspection, extraction and chart features
• Comparative mode
Installers for all operating systems will be available from:
http://www-ab.informatik.uni-tuebingen.de/software/megan
Daniel Richter – University of Tuebingen Functional Metagenome Analysis Stockholm, 09/06/27 [25]