This document discusses using high-throughput transcriptomics to study molecular evolution and population genetics. It outlines plans to sequence transcriptomes of 30 non-model species and their close outgroups to better understand how life history traits impact molecular evolution patterns at the genomic level. Specific goals include testing theories on how population size, generation time, mutation rates, and selection influence genetic diversity both within and between species. The proposed "PopPhyl" project aims to inject species biology into comparative genomics by exploring molecular diversity in a diverse set of taxa and testing population genetic predictions across whole genomes.
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Grenoble 2011 galtier
1. CBGP, mars 2011
Transcriptomique haut-débit pour l'évolution
moléculaire et la génétique des populations
Nicolas Galtier
UMR 5554 - Institut des Sciences de l'Evolution - Montpellier
galtier@univ-montp2.fr
2. Molecular evolution in the 21st century
We have:
- an enormous amount of data (genomics)
- a robust theoretical framework (population genetics)
⇒ we should understand molecular variation patterns
Yet we do not really know:
- why some species evolve (much) faster than other, proteome-wise
- why GC-content varies between and across genomes
- by how much population size determines genetic diversity
- etc…
3. Molecular evolution in the 21st century
Why so many unsolved, basic questions?
- lacking theory
- biased sampling
genes
species
4. PopPhyl goals
Injecting species biology/ecology into comparative genomics
Exploring the molecular diversity of nonmodel taxa
Testing predictions of the population genetic theory genome-wide
body mass mutation rate
generation time population size within-species
abundance selection between species
mating system recombination
population genetic genomic
life history traits
parameters variation data
5. PopPhyl goals
Injecting species biology/ecology into comparative genomics
Exploring the molecular diversity of nonmodel taxa
Testing predictions of the population genetic theory genome-wide
Some specific questions we want to address:
- Why are fast-evolving taxa fast? (mutation, selection)
- Are abundant species more polymorphic than scarce ones?
- Is selection less efficient in selfers than outcrossers?
- How does longevity influence mito vs nuclear DNA evolution?
- Who optimizes codon usage, who does gBGC, and why?
- Is the rate of selective sweeps higher in large populations?
6. How?
coding sequences
- Target = transcriptome
expression data
focal species
(10 individuals)
- Sampling scheme: X 30
outgroups
(1 or 2 individuals)
- Next-Generation Sequencing technology
For each taxon:
5.105 400 bp reads (454, pooled individuals)
5.107 100 bp reads (illumina, tagged individuals)
8. Why are tunicates fast-evolving, proteome-wise?
E
C
V
T
- higher mutation rate?
- more prevalent adaptive evolution ?
- relaxed selective constraint on housekeeping genes ?
9. Data analysis pipeline
mapping
Solexa
reference transcriptome
assembling transcriptome reads
coding
454 SNP calling
annot.
πN, πS, dN, dS SNPs and
allele frequencies genotypes
11. 454 reads 454 reads 454 reads
Celera Mira Cap3
A B C
s c s c s c
Illumina reads
c c+s c+s
Abyss Cap3 Cap3 D
s
12. 454 reads Illumina reads 454 reads Illumina reads
Abyss
Cap3 c s
Abyss
Cap3
s c
s C c c+s
Cap3
c+s
Cap3 Cap3
E c+s - F refine F'
c+s
merge reads merge contigs
13. de novo transcriptome assembly: quantitative assessment
median assembly touched
data set method contigs mean lg N50
lg lg (Mb) genes
A Ciona_454 Celera 25,669 491 438 491 12.6 7616
B Ciona_454 Mira 33,196 635 526 650 21.1 7951
C Ciona_454 Cap3 24,515 671 540 713 16.5 7945
D Ciona_illu Abyss+Cap3 27,426 574 380 769 15.8 7704
E Ciona_mix merge reads 29,097 571 399 721 16.6 7982
F Ciona_mix merge contigs 27,956 726 529 891 20.3 8207
16. Assembling transcriptomes from NGS data:
a benchmark using Ciona intestinalis
predicted reference
contigs transcriptome
BLAST
no hit
1→1
m→1
1→n
m→n
17. no hit
1→1
m→1
1→n
m→n
full fragments
1→1 : m→1 :
partial alleles
full or
chimera partial
1→n : m→n :
multi
multi
20. Improving assemblies by filtering according to length + coverage
80%
correct
60%
4000 8000 12000
number of contigs
21. de novo transcriptome assembly from NGS data: conclusions
- illumina > 454
(454 useful yet)
- existing programs differ substantially in performance
(in PopPhyl we retain Cap3 and Abyss)
- correct cDNA predictions are minoritary in typical assemblies
- contig length + coverage is a reasonable quality criterion
- somewhat variable across species
22. Data analysis pipeline
mapping
Solexa
reference transcriptome
assembling transcriptome reads
coding
454 SNP calling
annot.
πN, πS, dN, dS SNPs and
allele frequencies genotypes
29. Population genomics of a fast-evolver
focal species: Ciona intestinalis B (8 individuals)
outgroup: Ciona intestinalis A (reference sequence)
1602 contigs (>10X in >5 individuals), of average length 138 codons
M1 M2
SNPs 30020 29544
error rate 0.021 [0.012-0.038] 0.020 [0.011-0.035]
allelic bias 0 [0.08-0.5]
stop codons 77 (0.26%) 117 (0.39%)
FIT -0.017 -0.054
nb best model 70 (4.6%) 1532 (95.4%)
30. Population genomics of a fast-evolver
focal species: Ciona intestinalis B (8 individuals)
outgroup: Ciona intestinalis A (reference sequence)
1602 contigs (>10X in >5 individuals), of average length 138 codons
average πS: 0.057 per site (a highly polymorphic species)
average πN: 0.0026 per site
πN/πS : 0.046 (strong level of purifying selection)
dN/dS : 0.103 (high impact of adaptive evolution)
estimated proportion of adaptive non-synonymous substitutions: 54%
31. Why are tunicates fast-evolving, proteome-wise?
E
C
V
adaptive
T
neutral
deleterious
- higher mutation rate? YES
- more prevalent adaptive evolution ? YES
- relaxed selective constraint on housekeeping genes ? NO
→ large Ne, large µ (per year)
32. Conclusions
- de novo population genomics from NGS transcriptome data is doable
- transcriptome assembly is probably the most tricky step
- major population genomic descriptors are robust to error models
- life history traits apparently impact molecular evolution to some extant
- long-lived, small population-sized species are the best choice for phylogenomics
34. Subprojects we have started
- selfers vs outcrossers in snails and nematodes
- long-lived vs short-lived in insects
- big vs small in amniotes
phylogeny of turtles
- fast proteic evolution in tunicates and nematodes
- extreme longevity
35. Thanks to:
Philippe Gayral CNRS
Vincent Cahais
Georgia Tsagkogeorga
Marion Ballenghien
Zef Melo Ferreira
Ylenia Chiari
Lucy Weinert
ISEM
Sylvain Glémin
Nico Bierne
Khalid Belkhir
Fred Delsuc
Vincent Ranwez
Guillaume Dugas
Sébastien Harispe ERC
Caroline Benoist