The document discusses gene tree reconciliation, which involves projecting gene trees onto a species tree to account for evolutionary events like gene duplications, losses, and horizontal transfer. It outlines existing cyberinfrastructure for generating and visualizing reconciliations, and proposes ways to extend this, such as allowing users to submit their own gene trees and alignments for reconciliation, integrating visualization tools, and storing multiple reconciliations per gene tree. A goal is to "make tree reconciliation phylotastic" by building components to allow users more flexibility in generating reconciliations from their own data.
2. iPlant Tree of Life (iPTOL)
• Tree Reconciliation
• Big Trees
• Data Assembly
• Trait Evolution
• Data Integration
• Tree Visualization
3. Gene Tree Reconciliation
Projection of gene trees onto a species tree
• gene duplications
• gene losses
• lineage sorting
• horizontal transfer
4. Gene Tree Reconciliation
• Locating gene duplications allows us to
identify orthologs and paralogs
• Identify gene composition in inferred ancestral
genomes
• Map of the positions of ancestral polyploidy
events
• Contribute to the study of the “fate” of
duplicated genes
• Address questions of gene family coevolution
6. Extending TR Cyberinfrastructure
• Increased interoperability among the
component pieces
• Query the location of gene duplications on the
species tree
• Integrate tree visualization tools that scale to
many thousands of nodes
• Allow for the storage and analysis of multiple
reconciliations for a single gene tree within a
single database structure
7. Extending TR Cyberinfrastructure
Generate Visualize
Reconciliations Gene Reconciliations
Trees
TreeBeST primeTV
Reconciled
primeGSR
fltreebest
NOTUNG Species
Trees
annot8r
Ontology
Functional
Annotation
12. Current Limitations
• Users query against a pre-computed set of
reconciliations
• We generate the species trees
• We generate the gene trees given alignments
• We generate reconciliation mappings
• Reconciliation visualization is currently tied to
the database
• Users can NOT submit their own data (genes
trees or alignments) for reconciliation
13. Making TR Phylotastic
• Allow users to generate reconciliations
using their own data
• Supply a species tree OR
• Supply an gene family alignment
14. Phylotastic Components
• Name resolution
• Given a gene tree or alignments determine the
species list
• Tree Pruner
• Given the species list above, generate the
species tree required for reconciliation
• NEXML encoding
• Return reconciled tree using NEXML
This iPlant Sponsored Tree Reconciliation Working group is one of six main working groups that are part of the iPlant Tree of Life program. The overall goals of iPToL project are to develop the cyberinfrastructure needed to assemble, visualize and analyze the plant tree of life. The goals of the Tree Reconciliation Working group include the development of database tools for 'post-tree' analysis of the reconciliation of gene trees to species trees. This is post-tree in the sense that the species tree is taken as a given that will result from work being developed by the Big Trees group.
Gene tree reconciliations allow us to map processes and events from the gene tree onto the species tree. These include: *gene duplications *gene losses *lineage sorting *horizontal transfer
The utility of gene tree reconciliation … Ancestral polyploidy events are a major component of plant genome evolution.
Existing tools for gene tree reconcliation include: *Software to generate reconciliations (TreeBeST, primeGSR) *Software to visualize these reconciliations (primeTV/fltreebest) *Databases such as En semble Compara that allow us to store reconciled gene trees as well as information regarding the sequences, alignments and locations of the genes comprising the reconciled gene families
Our initial goals in extending cyberinfrastructure for gene tree reconciliation involved developing a static database of precomputed reconciliations.
We extended the Ensemble Compare database design to include precomputed species trees, precomputed gene trees and a reconciliation mapping between the two. We have also added support for ontologies to tag attributes of trees, nodes, functional gene annotation and developed a Tree We have high-throughput pipelines for TreeBEST, primeGSR and NOTUNG to generate large numbers of reconciliations and load these to the database. We can also populate functional annotation of genes using input from the annot8r functional annotation program. We also have developed a new interface for visualizing reconciled trees. This interface allow for visualizing reconciled trees stored in the database as well as supports queries to find reconciled trees within the database.
The GUI allows for a simultaneously viewing the species tree and a gene tree reconciled to that species tree. These trees “interact” such that selecting branches in one tree can highlight nodes and edges in the other.
The gene tree node color highlight the location of duplication and speciation events ..
.. the species tree maps the location of duplication events from the gene tree onto the species tree. Duplication events are shown here as green triangles.
The GUI also provides a way to find reconciled gene families within the database …. Queries for: BLAST Can search for gene families in the database that match a DNA or protein sequence query. GO Term Can search for gene families that have been annotated for a specific GO term. Locus Name It is possible to identify the gene families that contain a known locus name. Gene Family Name It is also possible to jump directly to a gene family name.
Having reconciliations mapped to a database that can be queried like this is awesome, and allows us to ask new questions,
Having reconciliations mapped to a database that can be queried like this is awesome, and allows us to ask new questions,
A difficulty here is determining the species source of the gene given the gene information. The third component, shown here as NEXML encoding would depend in part on the standards used by phylotastic for communication among the components of the phylotastic workflows. See Daniel Packer’s GSOC Project for notes on NEXML encoding.
The DNA subway is an AWESOME education tool that takes users through the process of genome annotation. Starting with genome sequence data (such a sequenced BAC), students find the genes and can even generate gene trees using their annotated gene as a query sequence for an automated generation of a gene tree. The ‘Prospect Genome’ track current dead ends with this gene tree. Given a system that could accept that gene tree as input for reconciliation it would be possible to generate a reconciled gene tree that would provide an awesome way to introduce students to the concepts of orthology and paralogy using data that they have generated themselves starting with raw genome sequence. In this case the initial input is unannotated genome sequence .. so it would be possible to go from raw genome sequence data to reconciled gene trees using an intuitive interface that is simple enough to use in undergraduate education. This is awesome because this could be student generated sequence data that has never been annotated before, and the pipeline could result in a set of student derived reconciled gene trees.