This is the third presentation of the BITS training on 'Comparative genomics'.
It reviews the basic concepts of sequence homology on the gene
Thanks to Klaas Vandepoele of the PSB department.
A Journey Into the Emotions of Software Developers
BITS - Comparative genomics on the genome level
1. Comparative genomics
in eukaryotes
Genome analysis
Klaas Vandepoele, PhD
Professor Ghent University
Comparative & Integrative Genomics
VIB – Ghent University, Belgium
2. I. Genome conservation & genomic
homology
Alignment of homologous regions
Inter-genomic: aligning genomic sequences from different
species
Intra-genomic aligning genomic sequences from the same
species
Different levels of resolution
Comparative mapping (markers)
Synteny (~ gene content)
Colinearity (gene content + order conservation)
DNA-based alignments (base-to-base mapping)
2
4. Human – Mouse orthologous
regions resolution
Genome translocations associated
Comparative
with human-mouse speciation
mapping
Human
Mouse chr IV
4 www.ensembl.org
5. Human genome browser
resolution
Conserved gene Human chr I
content & order Mouse chr IV
Gene loss and insertions in orthologous
segments since human-mouse speciation
EST/cDNA
similarities
Genome
similarities
5 Human gene model
6. Human – Mouse base-to-base
mapping resolution
Functional sequences
(e.g. exons) evolve slower
than non-functional ones
(e.g. introns) due to
natural selection against
mutations in these regions
Consequently,
functional elements, both
coding and non-coding,
are unusually well
conserved in orthologous
regions
Blue: coding exons GT donor AG acceptor
6
9. Genome size variation in the grasses:
the use of model systems
BEP Rice 450Mb
46 MYA
55 MYA Barley ~5000Mb
28 MYA
PACC Sorghum ~750Mb
Maize ~2400Mb
9 Gaut 2002
13. II. Computational detection of
genomic homology
Synteny
~ conservation of gene content
Colinearity
~ conservation of (gene) content & order
Macro-colinearity
Marker-based
Micro-colinearity
DNA based or gene-based
13
16. Map-based approach
Chromosome 1
• Represent chromosomes
as sorted gene lists
• Identify all homologous
Chromosome 2
gene pairs between
chromosomes (all-
against-all BLASTP*).
• Score pairs of
homologues in matrix
Identifying homologous regions = identifying diagonal series of
elements in the gene homology matrix (GHM).
16 Vandepoele et al., Genome Research 2002
21. And what about synteny?
HsaC1
• Application of 2-
dimensional sliding-
HsaC9
window approach to
score regions with a high
density of homologous
genes between 2
chromosomes
ancient duplication
Identifying syntenic regions = identifying high homolog-density
regions in the gene homology matrix (GHM).
21 DeSyRe, Vandepoele et al. unpublished
22. Detection of recent and ancient large-
scale duplications
recent duplication ancient duplication
C2 HsaC1
C4 HsaC9
22 colinearity synteny
23. III. Whole-genome alignments
Evolutionary constrained sequences are a
good indicator of functional genome regions
Basic protocol
1. Sequence generation
2. Reconstructing homologous colinearity across
related genomes
3. Multi-sequence alignment
4. Detection sequences under purifying selection.
23 Margulies & Birney, NRG 2008
24. Reconstructing homologous
colinearity
• Segmental duplication and other species-specific
rearrangements (e.g. inversions, insertions, deletions)
interfere with the accurate detection of orthologous
genomic regions
24
25. Tools
Mercator (Ensembl)
coding exons as anchor points
graph of colinearity information
travel through graph to generate homologous
regions
chains-and-nets (UCSC)
reference-based local alignments different
genomes (BLASTZ)
filtering highest-scoring chains
net together chains from same locus
25
32. Conserved Non-coding Sequences or
Elements (CNS/CNE)
Human/dog
Human/mouse
Mouse/dog
VISTA plot
Blue: exons
Turquoise: UTR
32
33. Exercise
Explore the genome organization and
conservation of your favorite locus in a set of
related species.
Plants
http://bioinformatics.psb.ugent.be/plaza/
Vertebrates
http://teleost.cs.uoregon.edu/synteny_db/
Yeast
http://wolfe.gen.tcd.ie/ygob/
33