Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
The complexity of plant genomes
1. THE COMPLEXITY OF PLANT GENOMES
Genome structure, gene functions and beyond
Klaas Vandepoele
Barcelona, October 10th 2012
Department of Plant Biotechnology and Bioinformatics, Ghent University
Department of Plant Systems Biology, VIB - Belgium
http://twitter.com/plaza_genomics
2. OVERVIEW
And then there were many: plant genome sequences
PLAZA: a web-based plant comparative genomics
toolbox
Genome organization and evolution
The quest for plant orthologous genes
Unravelling gene functions using integrative plant
genomics
Cross-species gene function analysis
5. GENOME ANNOTATION
Functional
Annotated
Genoscope BGI JGI EST genes
Genomic DNA
Sequences
Downstream
analysis
Artemis Manual
GenomeView Curation
Coding potential Repeats
search Training
set
Intron potential Build splice
IMM SpliceMachine
search Site models
Repeat
Mask
Intergenic
potential search
Automatic Mask
Eugene repeats
annotation
GenomeView Bogas
tBlastx Blastx Blastn
Expert
Structural
annotation
annotated
genes
Related Swissprot EST
genomes Nr_prot cDNA Gene
Ontology
Functional
annotation
InterPro
Predicted
genes
Source: P. Rouzé
6. EXPLOITING GENOME INFORMATION
Centralized infrastructure
Detailed gene catalog per species
Structural annotation (gene models, UTRs)
Functional annotation (experimental, sequence-based, systems
biology)
Intuitive & advanced data mining tools for non-expert
users
Gene function
Genome organization
Pathway evolution
Data manipulation
Computational resources
7. Gene family analysis
Genome analysis
>20 tools available
Proost et al., Plant Cell 2009; Van Bel et al., 2012
8. HOMOLOGOUS GENE FAMILIES
>780K proteins
from 25 species
Protein clustering
Phylogenetics
18K trees incl. 420K 22K multi-species gene families
annotated tree nodes covering 83% of the total proteome
9. GENE COLINEARITY & GENOME ORGANIZATION
Chromosome 1
• Represent chromosomes as
sorted gene lists
Chromosome 2
• Identify all homologous gene
pairs between chromosomes (all-
against-all BLASTP).
• Score pairs of homologues in
matrix
1
Gene Homology Matrix (GHM)
i-ADHoRe 3.0 2
17. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFIC
GENE DUPLICATES
Species
specific
duplicates
Divide in block & tandem
duplicates
Gene-sets
PLAZA
workbench
GO enrichment
Proost et al., Plant Cell 2009
18. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFIC
GENE DUPLICATES
Species
specific
duplicates
Divide in block & tandem
duplicates
Gene-sets
Gene Ontology
PLAZA
workbench
GO enrichment
19. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFIC
GENE DUPLICATES
Species
specific
duplicates
Divide in block & tandem
duplicates
Gene-sets
Gene Ontology
PLAZA
workbench
GO enrichment
21. THE QUEST FOR PLANT ORTHOLOGS
Plants are paleopolyploids
Dynamic genome organization
Large fraction of multi-gene
families
Absence of simple 1:1
orthology relationships
23. GENE DYNAMICS IN THE GREEN LINEAGE
Green algae Brown algae Land plants
Diatoms
24. PLANT GENE FAMILIES, A TALE OF DUPLICATIONS
F-box protein domain gene family
25. PLAZA INTEGRATIVE ORTHOLOGY VIEWER
•Tree-based orthologs (TROG) inferred using tree reconciliation
•Orthologous gene families (ORTHO) inferred using OrthoMCL
•Anchor points refer to gene-based colinearity between species Van Bel et al.,
•Best hit families (BHIF) inferred from Blast hits including inparalogs Plant Physiology 2012
35. PROPERTIES INPUT – MODULE DATA
40% of the genes in the modules is present in more than one input data type
only 3% of the gene pairs within a module having support by more than one
primary data type
36. MODULE OVERLAP
Primary Data Modules
Datatype # Genes # Associations (% # Genes # Modules Functional Motif
unique) (% unique) Enrichment Enrichment
PPI 3,194 7,210 (75%) 597 72 (95%) 51 43
AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172
TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224
GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341
Total 22,492 1,089,661 13,428 2,114 1,161
Non-redundant 13,142 1,563 676 772
Modules
>99% modules found through a single input data type
37. FUNCTIONAL AND CIS-REGULATORY COHERENCE
OF PLANT MODULES
Cis-regulatory element analysis
• Weeder / MotifSampler de novo
motif finding (1544 motifs)
• Overlap with known plant motifs
AGRIS/PLACE (34%)
Functional enrichment analysis
• Over-representation hypergeometric
distribution + FDR
• Non-electronic GO annotations +
embryo-lethal gene (SeedGenes)
40% of the modules could be linked to a significant functional enrichment (GO BP -
embryo lethality)
98% of the modules have 1 (or more) gene(s) with a known experimental annotation
40. CONSERVED MODULE EXPRESSION COHERENCE
Lipid biosynthesis
58% of modules shows significant coexpression coherence (3 or more species)
>43,000 unknown genes from 6 other plants receive module-based functional
annotations
41. MODULE-BASED FUNCTION PREDICTIONS
Can we recover new experimental Arabidopsis gene – GO BP
annotations?
Data freeze Evaluation
1460 Arabidopsis genes with predictions receive new exp. GO-BP
Unknown Unknown Exp. BP Other Exp. BP Total
#Pred. #Conf #Pred #Conf #Pred #Conf #Pred # Conf
All Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%)
Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%)
Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%)
42. DNA ENDOREDUPLICATION
• PPI module: predicted to be
involved in DNA endoreduplication
• Experimental validation shows that
AT1G06590 T-DNA shows
perturbed endoreduplication index
(Quimbaya et al., 2012)
Plant Mutants Flow Cytometry
Quimbaya, Vandepoele,… De Veylder, 2012
44. 3-WAY SPECIES CO-EXPRESSION COMPARISON FOR ETG1
• Conserved DNA
replication module
• Conserved E2F target
gene (TTTCCGC)
• Role in sister
chromatin cohesion
Movahedi et al., 2012; Takahashi et al., 2010
45. SORTING OUT PLANT (CO-)ORTHOLOGS USING
EXPRESSION CONTEXT CONSERVATION
Protein integrative orthology
Expression Context Conservation
scores (p-value < 0.05)
Inparalogs (species-specific
duplicates)
46. 4. CONCLUSIONS
Need for advanced & user-friendly tools to characterize new genomes
Complexity and quality genome sequences
Scalability with increasing number of genomes
Integrative approaches combining multiple methods outperform individual
methods* and provide users a more complete view
Computer power
Visualization
Large discrepancy in the functional gene associations between the different
experimental data sets
A large fraction of the module-based functional predictions are biologically
valid and can be transferred across species
Comparative network approaches provide a powerful tool to integrate
functional genomics data
* Quest for Orthologs Consortium, Bioinformatics 2012
47. ACKNOWLEDGEMENTS
Ken Heyndrickx
Michiel Van Bel
Sebastian Proost
Sara Movahedi
Mauricio Quimbaya
48. ACKNOWLEDGEMENTS
Further reading
Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and
Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and
genome evolution in plants. Plant Cell
Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional
plant modules through the integration of complementary data sources. Plant Physiol.
Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co-
expression analysis in plant biology. Plant, Cell & Environment