SlideShare una empresa de Scribd logo
1 de 48
THE COMPLEXITY OF PLANT GENOMES

Genome structure, gene functions and beyond




 Klaas Vandepoele
 Barcelona, October 10th 2012

 Department of Plant Biotechnology and Bioinformatics, Ghent University
 Department of Plant Systems Biology, VIB - Belgium

    http://twitter.com/plaza_genomics
OVERVIEW

   And then there were many: plant genome sequences

   PLAZA: a web-based plant comparative genomics
    toolbox
       Genome organization and evolution
       The quest for plant orthologous genes


   Unravelling gene functions using integrative plant
    genomics

   Cross-species gene function analysis
1. OVERVIEW PLANT GENOME SEQUENCING




                                  Individual
                                  institutes
      International
      consortia




     Today: ~40 (complete) plant genome sequences
GENOME ANNOTATION
                                                                                                                                                            Functional
                                                                                                                                                            Annotated
    Genoscope                 BGI               JGI                    EST                                                                                    genes
                                                                                             Genomic DNA
                                                                     Sequences



                                                                                                                                                              Downstream
                                                                                                                                                                analysis
                                                             Artemis          Manual
                                                            GenomeView        Curation




 Coding potential                                                                                                       Repeats
   search                                                                            Training
                                                                                       set
 Intron potential                                              Build splice
                              IMM           SpliceMachine
    search                                                     Site models
                                                                                                               Repeat
                                                                                                               Mask
     Intergenic
potential search
                                                      Automatic                                    Mask
                                    Eugene                                                        repeats
                                                      annotation


                                                                                                                                  GenomeView        Bogas

                    tBlastx         Blastx            Blastn

                                                                                                                                         Expert
                                                                              Structural
                                                                                                                                       annotation
                                                                              annotated
                                                                               genes
           Related              Swissprot                EST
          genomes                Nr_prot                cDNA                                                 Gene
                                                                                                            Ontology
                                                                                    Functional
                                                                                    annotation


                                                                                                            InterPro


                                                                                 Predicted
                                                                                  genes




                                                                                                                                                                         Source: P. Rouzé
EXPLOITING GENOME INFORMATION
      Centralized infrastructure

      Detailed gene catalog per species
         Structural annotation (gene models, UTRs)
         Functional annotation (experimental, sequence-based, systems
          biology)


      Intuitive & advanced data mining tools for non-expert
       users
              Gene function
              Genome organization
              Pathway evolution
              Data manipulation


      Computational resources
Gene family analysis
Genome analysis




                                                >20 tools available


                              Proost et al., Plant Cell 2009; Van Bel et al., 2012
HOMOLOGOUS GENE FAMILIES
                        >780K proteins
                        from 25 species




                                     Protein clustering
                                       Phylogenetics




 18K trees incl. 420K                                       22K multi-species gene families
annotated tree nodes                                      covering 83% of the total proteome
GENE COLINEARITY & GENOME ORGANIZATION

               Chromosome 1
                              • Represent chromosomes as
                               sorted gene lists
Chromosome 2




                              • Identify all homologous gene
                               pairs between chromosomes (all-
                               against-all BLASTP).

                              • Score pairs of homologues in
                               matrix

                                                               1
Gene Homology Matrix (GHM)
i-ADHoRe 3.0                                                   2
GENOMIC PROFILES
  pairwise




                                                  multiple




                   Simillion et al. (2004) Genome Res. 14, 1095-1106
IMPROVED SENSITIVITY TO DETECT DEGENERATE
GENOMIC HOMOLOGY




                        (#homologous segments)




                                  Proost, Fostier … & Vandepoele, NAR 2011
I-ADHORE 3.0
               Speed & memory footprint




                     Fostier, … & Vandepoele, Bioinformatics 2011
                       Proost, Fostier … & Vandepoele, NAR 2011
GENOME-WIDE COLINEARITY
Z. mays                    WGDotplot




                           O. sativa
MULTI-SPECIES COLINEARITY




                            profile
WHOLE-GENOME CIRCULAR DOTPLOT

                                  Reference: O. sativa




                            Inner circle: duplicated regions
                 Outer circle: inter-species colinear regions
Gene family analysis
Genome analysis




                              Proost et al., Plant Cell 2009; Van Bel et al., 2012
FUNCTIONAL ANALYSIS           OF SPECIES-SPECIFIC
GENE DUPLICATES
        Species
        specific
       duplicates




 Divide in block & tandem
         duplicates




     Gene-sets




    PLAZA
    workbench

              GO enrichment

                                              Proost et al., Plant Cell 2009
FUNCTIONAL ANALYSIS              OF SPECIES-SPECIFIC
GENE DUPLICATES
        Species
        specific
       duplicates




 Divide in block & tandem
         duplicates




     Gene-sets



                              Gene Ontology




    PLAZA
    workbench

              GO enrichment
FUNCTIONAL ANALYSIS              OF SPECIES-SPECIFIC
GENE DUPLICATES
        Species
        specific
       duplicates




 Divide in block & tandem
         duplicates




     Gene-sets



                              Gene Ontology




    PLAZA
    workbench

              GO enrichment
CORE HISTONE CLUSTERS IN C. REINHARDTII




   Synteny plot


                                    Proost et al., Plant Cell 2009
THE QUEST FOR PLANT ORTHOLOGS

   Plants are paleopolyploids
       Dynamic genome organization

       Large fraction of multi-gene
        families

       Absence of simple 1:1
        orthology relationships
Source: Y. Van de Peer
GENE DYNAMICS IN THE GREEN LINEAGE




 Green algae    Brown algae   Land plants
                Diatoms
PLANT   GENE FAMILIES, A TALE OF   DUPLICATIONS




                                      F-box protein domain gene family
PLAZA INTEGRATIVE ORTHOLOGY VIEWER




     •Tree-based orthologs (TROG) inferred using tree reconciliation
     •Orthologous gene families (ORTHO) inferred using OrthoMCL
     •Anchor points refer to gene-based colinearity between species                    Van Bel et al.,
     •Best hit families (BHIF) inferred from Blast hits including inparalogs   Plant Physiology 2012
COMPLEX GENE ORTHOLOGY RELATIONSHIPS

                 Query species: A. thaliana
Target species
3. PLANT –OMICS SPACE




                        Mochida and Shinozaki, 2011
INTEGRATIVE PLANT GENOMICS

   Explore genome-wide –omics data sets to study gene function
    and regulation
       Transcriptomics (Microarrays|RNA-Seq)

       Interactome data (Y2H|TAP)

       Regulatory interactions (TF|miRNA-target|TF motifs)


   Include expert gene annotations
     Dedicated databases (e.g. phenotypes, metabolomics)
     Text-mining
GENE NETWORK ANALYSIS

   Features
       Integration heterogeneous –omics data sources
         Different gene-gene associations with varying quality
         Missing data




       Exploit network-guided guilt-by-association
        principle

       Methodologies
             Simple un-weighted/weighted graphs
             Probabilistic models




                                                                  Lee et al., 2010
EXPERIMENTAL ARABIDOPSIS GENE-GENE
ASSOCIATION DATA

Datatype          # Genes    # Associations (% unique)       Source

PPI               3,194      7,210 (75%)                     CORNET

AraNet*           19,647     1,062,222 (99%)                 Lee et al., 2010

TF targets        9,422      13,037 (99%)                    AtRegNet (AGRIS)

GO                6,588      89,100 (n.a.)                   GeneOntology.org / TAIR


Total             22,492     1,089,661

 * Probabilistic network integrating heterogeneous genomic features


Research objectives:
        •   Infer functional gene modules starting from experimental data
        •   Identify regulatory properties of genes, modules and network
        •   Explore cross-species functional annotation

                                                                       Heyndrickx and Vandepoele, 2012
DELINEATING ARABIDOPSIS GENE MODULES

   Transform gene-gene associations in networks and
    functional gene modules
CONVERTING STATIC                GENE ASSOCIATIONS INTO
FUNCTIONAL EXPRESSION                    MODULES

   Classical approach
    1.       Clustering expression data
         •   Guide-gene (gene-centric)
         •   Non-targeted (global)
    2.       Functional analysis modules using
             enrichment statistic


   Challenges - weaknesses
        Which microarray samples to include?
        Functional information integrated a
         posteriori



                                                          Aoki et al., 2007
EXPRESSION-BASED CLUSTERING

   Integrate a priori functional information
    during module detection
   Semi-supervised clustering strategy
    considering multiple query genes and
    multiple expression compendia
   Rank aggregation through scoring
    function
       maximize coexpression towards multiple seeds
        showing dynamic expression profile
ANALYSIS GENE MODULES
PROPERTIES INPUT – MODULE DATA




   40% of the genes in the modules is present in more than one input data type
   only 3% of the gene pairs within a module having support by more than one
    primary data type
MODULE OVERLAP

Primary Data                                  Modules

Datatype        # Genes   # Associations (%   # Genes # Modules Functional Motif
                          unique)                     (% unique) Enrichment Enrichment

PPI             3,194     7,210 (75%)         597       72 (95%)    51      43

AraNet          19,647    1,062,222 (99%)     6,377     419 (99%)   116     172

TF targets      9,422     13,037 (99%)        5,127     518 (96%)   51      224

GO              6,588     89,100 (n.a.)       7,750     1,105 (99%) 943     341

Total           22,492    1,089,661           13,428    2,114       1,161

Non-redundant                                 13,142    1,563       676     772
Modules


   >99% modules found through a single input data type
FUNCTIONAL AND                CIS-REGULATORY COHERENCE
OF PLANT MODULES


                                                                Cis-regulatory element analysis
                                                                • Weeder / MotifSampler de novo
                                                                motif finding (1544 motifs)
                                                                • Overlap with known plant motifs
                                                                AGRIS/PLACE (34%)




                                                                Functional enrichment analysis
                                                                • Over-representation hypergeometric
                                                                distribution + FDR
                                                                • Non-electronic GO annotations +
                                                                embryo-lethal gene (SeedGenes)




   40% of the modules could be linked to a significant functional enrichment (GO BP -
    embryo lethality)
   98% of the modules have 1 (or more) gene(s) with a known experimental annotation
FUNCTIONAL MODULE REPERTOIRE




                http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
CROSS-SPECIES MODULE ANALYSIS

                        Affymetrix GeneChip




                   NCBI Gene Expression Omnibus




1563 Arabidopsis                Integrative
   modules                       orthology
CONSERVED MODULE EXPRESSION COHERENCE




                                                         Lipid biosynthesis


   58% of modules shows significant coexpression coherence (3 or more species)
   >43,000 unknown genes from 6 other plants receive module-based functional
    annotations
MODULE-BASED                        FUNCTION PREDICTIONS


   Can we recover new experimental Arabidopsis gene – GO BP
    annotations?
            Data freeze                                                            Evaluation
                  1460 Arabidopsis genes with predictions receive new exp. GO-BP




                     Unknown              Unknown Exp. BP           Other Exp. BP               Total


                     #Pred. #Conf         #Pred #Conf               #Pred    #Conf              #Pred   # Conf


All Genes            197     75 (38.1%) 255        108 (42.4%)      1,008    251 (24.9%)        1,460   434 (29.7%)


    Conserved        166     65 (39.2%) 195        80 (41%)         871      215 (24.7%)        1,232   360 (29.2%)


    Not Conserved    48      10 (20.8%) 83         31 (37.3%)       315      52 (16.5%)         446     93 (20.9%)
DNA ENDOREDUPLICATION


                    •   PPI module: predicted to be
                        involved in DNA endoreduplication
                    •   Experimental validation shows that
                        AT1G06590 T-DNA shows
                        perturbed endoreduplication index
                        (Quimbaya et al., 2012)




                         Plant Mutants      Flow Cytometry




                          Quimbaya, Vandepoele,… De Veylder, 2012
4. INTEGRATED CO-EXPRESSION – ORTHOLOGY
   NETWORKS




Movahedi et al., 2012
3-WAY SPECIES CO-EXPRESSION   COMPARISON FOR                 ETG1




                                            •   Conserved DNA
                                                replication module
                                            •   Conserved E2F target
                                                gene (TTTCCGC)
                                            •   Role in sister
                                                chromatin cohesion



                                Movahedi et al., 2012; Takahashi et al., 2010
SORTING OUT PLANT (CO-)ORTHOLOGS   USING
EXPRESSION CONTEXT CONSERVATION




                                     Protein integrative orthology

                                     Expression Context Conservation
                                     scores (p-value < 0.05)




                                      Inparalogs (species-specific
                                      duplicates)
4. CONCLUSIONS

   Need for advanced & user-friendly tools to characterize new genomes
       Complexity and quality genome sequences
       Scalability with increasing number of genomes


   Integrative approaches combining multiple methods outperform individual
    methods* and provide users a more complete view
       Computer power
       Visualization


   Large discrepancy in the functional gene associations between the different
    experimental data sets
   A large fraction of the module-based functional predictions are biologically
    valid and can be transferred across species


   Comparative network approaches provide a powerful tool to integrate
    functional genomics data
                                             * Quest for Orthologs Consortium, Bioinformatics 2012
ACKNOWLEDGEMENTS

   Ken Heyndrickx


   Michiel Van Bel
   Sebastian Proost


   Sara Movahedi


   Mauricio Quimbaya
ACKNOWLEDGEMENTS

Further reading

   Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and
    Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and
    genome evolution in plants. Plant Cell


   Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional
    plant modules through the integration of complementary data sources. Plant Physiol.


   Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co-
    expression analysis in plant biology. Plant, Cell & Environment

Más contenido relacionado

Último

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Último (20)

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

The complexity of plant genomes

  • 1. THE COMPLEXITY OF PLANT GENOMES Genome structure, gene functions and beyond Klaas Vandepoele Barcelona, October 10th 2012 Department of Plant Biotechnology and Bioinformatics, Ghent University Department of Plant Systems Biology, VIB - Belgium http://twitter.com/plaza_genomics
  • 2. OVERVIEW  And then there were many: plant genome sequences  PLAZA: a web-based plant comparative genomics toolbox  Genome organization and evolution  The quest for plant orthologous genes  Unravelling gene functions using integrative plant genomics  Cross-species gene function analysis
  • 3. 1. OVERVIEW PLANT GENOME SEQUENCING Individual institutes International consortia Today: ~40 (complete) plant genome sequences
  • 4.
  • 5. GENOME ANNOTATION Functional Annotated Genoscope BGI JGI EST genes Genomic DNA Sequences Downstream analysis Artemis Manual GenomeView Curation Coding potential Repeats search Training set Intron potential Build splice IMM SpliceMachine search Site models Repeat Mask Intergenic potential search Automatic Mask Eugene repeats annotation GenomeView Bogas tBlastx Blastx Blastn Expert Structural annotation annotated genes Related Swissprot EST genomes Nr_prot cDNA Gene Ontology Functional annotation InterPro Predicted genes Source: P. Rouzé
  • 6. EXPLOITING GENOME INFORMATION  Centralized infrastructure  Detailed gene catalog per species  Structural annotation (gene models, UTRs)  Functional annotation (experimental, sequence-based, systems biology)  Intuitive & advanced data mining tools for non-expert users  Gene function  Genome organization  Pathway evolution  Data manipulation  Computational resources
  • 7. Gene family analysis Genome analysis >20 tools available Proost et al., Plant Cell 2009; Van Bel et al., 2012
  • 8. HOMOLOGOUS GENE FAMILIES >780K proteins from 25 species Protein clustering Phylogenetics 18K trees incl. 420K 22K multi-species gene families annotated tree nodes covering 83% of the total proteome
  • 9. GENE COLINEARITY & GENOME ORGANIZATION Chromosome 1 • Represent chromosomes as sorted gene lists Chromosome 2 • Identify all homologous gene pairs between chromosomes (all- against-all BLASTP). • Score pairs of homologues in matrix 1 Gene Homology Matrix (GHM) i-ADHoRe 3.0 2
  • 10. GENOMIC PROFILES pairwise multiple Simillion et al. (2004) Genome Res. 14, 1095-1106
  • 11. IMPROVED SENSITIVITY TO DETECT DEGENERATE GENOMIC HOMOLOGY (#homologous segments) Proost, Fostier … & Vandepoele, NAR 2011
  • 12. I-ADHORE 3.0 Speed & memory footprint Fostier, … & Vandepoele, Bioinformatics 2011 Proost, Fostier … & Vandepoele, NAR 2011
  • 13. GENOME-WIDE COLINEARITY Z. mays WGDotplot O. sativa
  • 15. WHOLE-GENOME CIRCULAR DOTPLOT Reference: O. sativa Inner circle: duplicated regions Outer circle: inter-species colinear regions
  • 16. Gene family analysis Genome analysis Proost et al., Plant Cell 2009; Van Bel et al., 2012
  • 17. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFIC GENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets PLAZA workbench GO enrichment Proost et al., Plant Cell 2009
  • 18. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFIC GENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  • 19. FUNCTIONAL ANALYSIS OF SPECIES-SPECIFIC GENE DUPLICATES Species specific duplicates Divide in block & tandem duplicates Gene-sets Gene Ontology PLAZA workbench GO enrichment
  • 20. CORE HISTONE CLUSTERS IN C. REINHARDTII Synteny plot Proost et al., Plant Cell 2009
  • 21. THE QUEST FOR PLANT ORTHOLOGS  Plants are paleopolyploids  Dynamic genome organization  Large fraction of multi-gene families  Absence of simple 1:1 orthology relationships
  • 22. Source: Y. Van de Peer
  • 23. GENE DYNAMICS IN THE GREEN LINEAGE Green algae Brown algae Land plants Diatoms
  • 24. PLANT GENE FAMILIES, A TALE OF DUPLICATIONS F-box protein domain gene family
  • 25. PLAZA INTEGRATIVE ORTHOLOGY VIEWER •Tree-based orthologs (TROG) inferred using tree reconciliation •Orthologous gene families (ORTHO) inferred using OrthoMCL •Anchor points refer to gene-based colinearity between species Van Bel et al., •Best hit families (BHIF) inferred from Blast hits including inparalogs Plant Physiology 2012
  • 26. COMPLEX GENE ORTHOLOGY RELATIONSHIPS Query species: A. thaliana Target species
  • 27. 3. PLANT –OMICS SPACE Mochida and Shinozaki, 2011
  • 28. INTEGRATIVE PLANT GENOMICS  Explore genome-wide –omics data sets to study gene function and regulation  Transcriptomics (Microarrays|RNA-Seq)  Interactome data (Y2H|TAP)  Regulatory interactions (TF|miRNA-target|TF motifs)  Include expert gene annotations  Dedicated databases (e.g. phenotypes, metabolomics)  Text-mining
  • 29. GENE NETWORK ANALYSIS  Features  Integration heterogeneous –omics data sources  Different gene-gene associations with varying quality  Missing data  Exploit network-guided guilt-by-association principle  Methodologies  Simple un-weighted/weighted graphs  Probabilistic models Lee et al., 2010
  • 30. EXPERIMENTAL ARABIDOPSIS GENE-GENE ASSOCIATION DATA Datatype # Genes # Associations (% unique) Source PPI 3,194 7,210 (75%) CORNET AraNet* 19,647 1,062,222 (99%) Lee et al., 2010 TF targets 9,422 13,037 (99%) AtRegNet (AGRIS) GO 6,588 89,100 (n.a.) GeneOntology.org / TAIR Total 22,492 1,089,661 * Probabilistic network integrating heterogeneous genomic features Research objectives: • Infer functional gene modules starting from experimental data • Identify regulatory properties of genes, modules and network • Explore cross-species functional annotation Heyndrickx and Vandepoele, 2012
  • 31. DELINEATING ARABIDOPSIS GENE MODULES  Transform gene-gene associations in networks and functional gene modules
  • 32. CONVERTING STATIC GENE ASSOCIATIONS INTO FUNCTIONAL EXPRESSION MODULES  Classical approach 1. Clustering expression data • Guide-gene (gene-centric) • Non-targeted (global) 2. Functional analysis modules using enrichment statistic  Challenges - weaknesses  Which microarray samples to include?  Functional information integrated a posteriori Aoki et al., 2007
  • 33. EXPRESSION-BASED CLUSTERING  Integrate a priori functional information during module detection  Semi-supervised clustering strategy considering multiple query genes and multiple expression compendia  Rank aggregation through scoring function  maximize coexpression towards multiple seeds showing dynamic expression profile
  • 35. PROPERTIES INPUT – MODULE DATA  40% of the genes in the modules is present in more than one input data type  only 3% of the gene pairs within a module having support by more than one primary data type
  • 36. MODULE OVERLAP Primary Data Modules Datatype # Genes # Associations (% # Genes # Modules Functional Motif unique) (% unique) Enrichment Enrichment PPI 3,194 7,210 (75%) 597 72 (95%) 51 43 AraNet 19,647 1,062,222 (99%) 6,377 419 (99%) 116 172 TF targets 9,422 13,037 (99%) 5,127 518 (96%) 51 224 GO 6,588 89,100 (n.a.) 7,750 1,105 (99%) 943 341 Total 22,492 1,089,661 13,428 2,114 1,161 Non-redundant 13,142 1,563 676 772 Modules  >99% modules found through a single input data type
  • 37. FUNCTIONAL AND CIS-REGULATORY COHERENCE OF PLANT MODULES Cis-regulatory element analysis • Weeder / MotifSampler de novo motif finding (1544 motifs) • Overlap with known plant motifs AGRIS/PLACE (34%) Functional enrichment analysis • Over-representation hypergeometric distribution + FDR • Non-electronic GO annotations + embryo-lethal gene (SeedGenes)  40% of the modules could be linked to a significant functional enrichment (GO BP - embryo lethality)  98% of the modules have 1 (or more) gene(s) with a known experimental annotation
  • 38. FUNCTIONAL MODULE REPERTOIRE http://bioinformatics.psb.ugent.be/cig_data/plant_modules/
  • 39. CROSS-SPECIES MODULE ANALYSIS Affymetrix GeneChip NCBI Gene Expression Omnibus 1563 Arabidopsis Integrative modules orthology
  • 40. CONSERVED MODULE EXPRESSION COHERENCE Lipid biosynthesis  58% of modules shows significant coexpression coherence (3 or more species)  >43,000 unknown genes from 6 other plants receive module-based functional annotations
  • 41. MODULE-BASED FUNCTION PREDICTIONS  Can we recover new experimental Arabidopsis gene – GO BP annotations? Data freeze Evaluation 1460 Arabidopsis genes with predictions receive new exp. GO-BP Unknown Unknown Exp. BP Other Exp. BP Total #Pred. #Conf #Pred #Conf #Pred #Conf #Pred # Conf All Genes 197 75 (38.1%) 255 108 (42.4%) 1,008 251 (24.9%) 1,460 434 (29.7%) Conserved 166 65 (39.2%) 195 80 (41%) 871 215 (24.7%) 1,232 360 (29.2%) Not Conserved 48 10 (20.8%) 83 31 (37.3%) 315 52 (16.5%) 446 93 (20.9%)
  • 42. DNA ENDOREDUPLICATION • PPI module: predicted to be involved in DNA endoreduplication • Experimental validation shows that AT1G06590 T-DNA shows perturbed endoreduplication index (Quimbaya et al., 2012) Plant Mutants Flow Cytometry Quimbaya, Vandepoele,… De Veylder, 2012
  • 43. 4. INTEGRATED CO-EXPRESSION – ORTHOLOGY NETWORKS Movahedi et al., 2012
  • 44. 3-WAY SPECIES CO-EXPRESSION COMPARISON FOR ETG1 • Conserved DNA replication module • Conserved E2F target gene (TTTCCGC) • Role in sister chromatin cohesion Movahedi et al., 2012; Takahashi et al., 2010
  • 45. SORTING OUT PLANT (CO-)ORTHOLOGS USING EXPRESSION CONTEXT CONSERVATION Protein integrative orthology Expression Context Conservation scores (p-value < 0.05) Inparalogs (species-specific duplicates)
  • 46. 4. CONCLUSIONS  Need for advanced & user-friendly tools to characterize new genomes  Complexity and quality genome sequences  Scalability with increasing number of genomes  Integrative approaches combining multiple methods outperform individual methods* and provide users a more complete view  Computer power  Visualization  Large discrepancy in the functional gene associations between the different experimental data sets  A large fraction of the module-based functional predictions are biologically valid and can be transferred across species  Comparative network approaches provide a powerful tool to integrate functional genomics data * Quest for Orthologs Consortium, Bioinformatics 2012
  • 47. ACKNOWLEDGEMENTS  Ken Heyndrickx  Michiel Van Bel  Sebastian Proost  Sara Movahedi  Mauricio Quimbaya
  • 48. ACKNOWLEDGEMENTS Further reading  Proost, S., Van Bel, M., Sterck, L., Billiau, K., Van Parys, T., Van de Peer, Y., and Vandepoele, K. (2009). PLAZA: a comparative genomics resource to study gene and genome evolution in plants. Plant Cell  Heyndrickx, K.S., and Vandepoele, K. (2012). Systematic identification of functional plant modules through the integration of complementary data sources. Plant Physiol.  Movahedi, S., Van Bel, M., Heyndrickx, KS., Vandepoele, K. (2012) Comparative co- expression analysis in plant biology. Plant, Cell & Environment