The document discusses a rice resequencing project between The Genome Analysis Centre and researchers in Vietnam. The project aims to sequence rice varieties with interesting phenotypes to help with rice breeding in Vietnam given climate changes. Researchers from Vietnam received training in bioinformatics at TGAC. 18 lanes of sequencing generated 1.3 terabytes of sequence data covering 30x on average for the varieties. SNPs were identified by aligning to reference genomes and filtering, finding on average 0.14% heterozygous SNPs. The results will be made available on TGAC's online browser to help with trait identification and rice improvement in Vietnam.
3. The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
The Genome Analysis Centre
RiceResequencing Project
Aim: To bring genomics capability to rice breeding in Vietnam
in light of changing climates
Approach: Sequence varieties with interesting phenotypes
and provide training in bioinformatics
4. The Genome Analysis Centre
The Genome Analysis Centre
Training at TGAC
Two scientists from AGI visited TGAC in September 2012 for
bioinformatics training
Training topics:
NGS assembly and alignment tools
Phylogenetics
Browser training
Variant calling
5. The Genome Analysis Centre
The Genome Analysis Centre
Varieties
The Genome Analysis Centre
The Genome Analysis Centre
Stress Varieties: indica, japonica and javanica
Bacterial Blight
Resistance
Hom rau; Khau dien lu; Nep meo nuong; Tep Thai
Binh; Toc lun
Blast Resistance
Ble te lo; Khau mac Buoc; Chiem nho Bac Ninh 2;
Nep lun; OM 6377
Brown Planthopper
Resistance
Chan thom; Coi ba dat; Khau giang; OM5629;
Xuong ga
Drought tolerance
Ba cho K’te; Blao sinh sai; Tan ngan; Nang quot
bien; Nep bo hong Hai Duong;
Quality potential
Nang thom cho Dao; Tam xoan Bac Ninh; Tam
xoan Hai Hau; Te Nuong; OM 3536; Thom lai
Salt tolerance
Lua Ngoi; Mot bui do; Chiem do; Nang co do 2;
Nep man
Unclassified
Nep ong tao; Khau Lien; Lua goc do; Chiem da;
IS1.2
6. The Genome Analysis Centre
The Genome Analysis Centre
Sequencing
The Genome Analysis Centre
The Genome Analysis Centre
Illumina HiSeq 2000
• DNA sheared into
fragments 300-500 bp in
length
• 100bp sequenced from
both ends of fragments
• 18 lanes of sequencing
(2.25 flowcells)
• 1.3 Tb of sequence data
7. The Genome Analysis Centre
The Genome Analysis Centre
Chiemda
Chiemdo
ChiemnhoBacNinh2
IS1.2
Luagocdo
Nangcodo2
NepbohongHaiDuong
Neplun
Nepman
OM3536
OM6377
TepThaiBinh
ThomLai
Toclun
Xuongga
BachoK’te
Blaosinhsai
Bletelo
Chanthom
Coibadat
Homrau
Khaudienlu
Khaugiang
KhauLien
Khaumacbuoc
LuaNgoi
Motbuido
Nangthomchodao
Nepmeonuong
Nepongtao
OM5629
TamxoanBacNinh
TamxoanHaiHau
Tanngan
TeNuong
Nangquotbien
0
10
20
30
40
50
60
70
Varieties
Coverage(assuming430Mbgenomesize)
Sequencing depth
The Genome Analysis Centre
The Genome Analysis Centre
Average coverage Detect homozygous base p(0.9975)
30x Detect heterozygous base p(0.9975)
8. The Genome Analysis Centre
The Genome Analysis Centre
Reference genomes
The Genome Analysis Centre
The Genome Analysis Centre
Indica: 93-11
Yu et al., Science 2002
Temperate japonica: Nipponbare
Goff et al., Science 2002
9. The Genome Analysis Centre
The Genome Analysis Centre
Align reads to references (BWA)
The Genome Analysis Centre
The Genome Analysis Centre
Chiemda
Chiemdo
ChiemnhoBacNinh2
IS1.2
Luagocdo
Nangcodo2
NepbohongHaiDuong
Neplun
Nepman
OM3536
OM6377
TepThaiBinh
ThomLai
Toclun
Xuongga
BachoK’te
Blaosinhsai
Bletelo
Chanthom
Coibadat
Homrau
Khaudienlu
Khaugiang
KhauLien
Khaumacbuoc
LuaNgoi
Motbuido
Nangthomchodao
Nepmeonuong
Nepongtao
OM5629
TamxoanBacNinh
TamxoanHaiHau
Tanngan
TeNuong
Nangquotbien
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
unaligned
Japonica-specific
Indica-specific
aligned to both
Varieties
Percentageofreads
indicas japonicas javanica
10. The Genome Analysis Centre
The Genome Analysis Centre
SNP discovery
The Genome Analysis Centre
The Genome Analysis Centre
Align reads to references using BWA v0.6.1
Detect variants using GATK v1.6
Insertion/deletion realignment
SNP calling and filtering
11. The Genome Analysis Centre
The Genome Analysis Centre
SNPs by reference
The Genome Analysis Centre
The Genome Analysis Centre
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
0 200000 400000 600000 800000 1000000 1200000 1400000 1600000
NumberofSNPsonjaponicareference
Number of SNPs on indica reference
Indicas
Japonicas
javanica
12. The Genome Analysis Centre
The Genome Analysis Centre
Heterozygosity
The Genome Analysis Centre
The Genome Analysis Centre
Average % heterozygous SNPs: 0.14%
Range % heterozygous SNPs: 0.05-0.38%
Average Ts/Tv: 2.39
Range Ts/Tv: 1.95-2.55
13. The Genome Analysis Centre
The Genome Analysis Centre
Grouped SNPs
The Genome Analysis Centre
The Genome Analysis Centre
Trait Number
of
shared
SNPs
Unique
to set
Shared
with 1
Shared
with 2
Shared
with 3
Shared
with 4
Shared
with 5
Shared
with 6
Blast 43,755 - - - - 2 6 6
Blight 22,065 - - 1 20 9 15 32
Drought 27,109 - - - - 1 2 3
Planthopper 43,238 - - - 1 - 7 20
Quality 55,256 - - - 1 8 14 30
Salt 51,472 - - 3 9 18 56 149
14. The Genome Analysis Centre
The Genome Analysis Centre
tgac-browser.tgac.ac.uk
Username: viet_rice
Password: v1et_r1ce
15. The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
The Genome Analysis Centre
Acknowledgements
Mario Caccamo
Sarah Ayling
Melanie Febrer
Anil Thanki
Xingdong Bian
Prof Ham
Khuat Huu Trung
Khoa Nguyen Truong
and colleagues
Giles Oldroyd
Christian Rogers