This is the first presentation of the BITS training on 'Comparative genomics'.
It reviews the basic concepts of sequence homology on different levels.
Thanks to Klaas Vandepoele of the PSB department.
3. What is comparative genomics?
Because all modern genomes have arisen from
common ancestral genomes, the relationships
between genomes can be studies with this fact in
mind. This commonality means that information gained
in one organism can have application in other even
distantly related organisms. Comparative genomics
enables the application of information gained from
facile model systems to agricultural and medical
problems. The nature and significance of differences
between genomes also provides a powerful tool for
determining the relationship between genotype and
phenotype through comparative genomics and
morphological and physiological studies.
3 http://genomics.ucdavis.edu/what.html
4. Principles
DNA sequences encoding and regulating the
expression of essential proteins and RNAs will be
conserved
Consequently, the regulatory profiles of genes
involved in similar processes among related
species will be conserved
Conversely, sequences that encode or control the
expression of proteins or RNAs responsible for
differences between species will be divergent
4
5. Definition
“ The combination of genomic data and comparative /
evolutionary biology to address questions of genome
structure, evolution and function”
5 Hardison, PLoS Biology 2003
6. What can we learn from cross-
species comparisons?
Genome conservation
transfer knowledge gained from model
organisms to non-model organisms
Genome variation
understand how genomes change over time in
order to identify evolutionary processes and
constraints
Detection of functional elements
Coding elements (e.g. exons)
Conserved non-coding sequences / elements
6
8. Homology & sequence similarity
Homology = shared ancestral common
origin
Inferred based on:
Sequence similarity
Similar (multi-) protein domain
composition and organization
So sequence similarity means homology?
No, it depends!
8 "Orthologs, paralogs, and evolutionary genomics“, Koonin 2005
9. Homology & sequence similarity
Sequence analysis aims at finding important sequence similarities
Sequence analysis aims at finding important sequence similarities
that would allow one to infer homology. The latter term is extensively
that would allow one to infer homology. The latter term is extensively
used in scientific literature, often without a clear understanding of its
used in scientific literature, often without a clear understanding of its
meaning, which is simply common origin.
meaning, which is simply common origin.
Homologous organs are not necessarily similar (at least the similarity
Homologous organs are not necessarily similar (at least the similarity
may not be obvious); similar organs are not necessarily homologous.
may not be obvious); similar organs are not necessarily homologous.
For some reason, this simple concept tends to get extremely muddled
For some reason, this simple concept tends to get extremely muddled
when applied to protein and DNA sequences. Phrases like “sequence
when applied to protein and DNA sequences. Phrases like “sequence
(structural) homology”, “high homology”, “significant homology”,
(structural) homology”, “high homology”, “significant homology”,
or even “35% homology” are as common, even in top scientific
or even “35% homology” are as common, even in top scientific
journals, as they are absurd, considering the definition.
journals, as they are absurd, considering the definition.
9
11. Genome-wide sequence retrieval
Finding information from whole-genome
low
sequencing projects
DNA sequence reads
Assembled genomic DNA sequences
Information value
Annotated genes (RNA genes + protein-
encoding genes)
Repeats, transposable elements
Integrated platform providing both sequence
high
data and functional genomics data
11
12. Genome databases
Species-specific databases
SGD
TAIR
Many others, e.g. wormbase, flybase,...
General & Integrative repositories
EBI Genomes & Integr8 / Ensembl
NCBI Entrez Genome
UCSC
12