4. ILLUMINA INPUT DATA: 20BN BASE PAIR,
180MN READS, 148X COVERAGE
Read
Read
Type
Read
Len
(Mean)
Insert
Size
Total bases
Total
( read pairs)
Total
( reads)
Estimated
Coverage
(X)
Hiseq Paired End 100 400 3,168,468,000 15,842,340 31,684,680 22
Hiseq Paired End 100 400 3,320,637,800 16,603,189 33,206,378 23
Hiseq Paired End 100 400 3,238,613,000 16,193,065 32,386,130 23
Hiseq
Short
Overlap
merged
178 335 2,211,006,384 12,401,099 16
Hiseq
Short
Overlap
merged
178 335 2,209,098,067 12,391,800 16
Hiseq
Short
Overlap
merged
178 335 2,178,151,877 12,215,609 15
Hiseq
Short
Overlap
unmerged
100 335 1,550,423,600 15,504,236 11
Hiseq
Short
Overlap
unmerged
100 335 1,551,440,000 15,514,400 11
Hiseq
Short
Overlap
unmerged
100 335 1,519,735,600 15,197,356 11
Total 20,947,574,328 48,638,594 180,501,688 148
CLC-Bio
Input
20,947,574,328 180,501,688
5/18/2012
4
IonTorrentDataAnalysis@MonsantoCo
5. SEDUM ILLUMINA REFERENCE
FLOWCYTOMETRY GENOME SIZE: 142MB ESTIMATED GENOME SIZE: 180MB CURRENT
GENOME SIZE: 255MB N50(SCAFFOLD): 2.8KB N50(CONTIG): 1.6KB
Scaffolding Stats Contigs Stats
5/18/2012
5
IonTorrentDataAnalysis@MonsantoCo
Number of scaffolds 219,455
Total size of scaffolds 267,197,078
Total scaffold length as percentage of
assumed genome size
2
Longest scaffold 124,757
Shortest scaffold 200
Number of scaffolds > 1K nt 63,451 28.90%
Number of scaffolds > 10K nt 2,498 1.10%
Number of scaffolds > 100K nt 1 0.00%
Mean scaffold size 1,218
Median scaffold size 464
N50 scaffold length 2,848
Percentage of assembly in scaffolded
contigs
52.40%
Percentage of assembly in
unscaffolded contigs
47.60%
Average number of contigs per
scaffold
1.3
Average length of break (>25 Ns)
between contigs in scaffold
162
Number of
contigs
292,607
Number of
contigs in
scaffolds
113,984
Number of
contigs not in
scaffolds
178,623
Total size of
contigs
255,346,080
Longest contig 56,108
Shortest contig 176
Number of
contigs > 1K nt
67,774 23.20%
Number of
contigs > 10K nt
641 0.20%
Mean contig
size
873
Median contig
size
412
N50 contig
length
1,615
6. ION TORRENT 400 BP CHIP
READ ANALYSIS
5/18/2012IonTorrentDataAnalysis@MonsantoCo
6
7. ION TORRENT INPUT DATA: 5BN BASE
PAIR, 19MN READS, 37X COVERAGE
Read Read Type
Read
Len
(Mean)
Total bases Total reads
Estimated
Coverage
(X)
Ion Torrent 400bp chip 286 897,163,323 3,130,643 6
Ion Torrent 400bp chip 241 931,376,271 3,850,295 7
Ion Torrent 400bp chip 269 1,113,089,592 4,126,822 8
Ion Torrent 400bp chip 252 1,098,412,220 4,350,400 8
Ion Torrent 400bp chip 274 1,207,920,840 4,408,077 9
Total 5,247,962,246 19,866,237 37
5/18/2012
7
IonTorrentDataAnalysis@MonsantoCo
8. ALIGNERS USED: BWASW AND TMAP
Parameters used in both aligners were default.
Where for both:
Mismatch penalty:3
Gap open penalty: 5
Gap extension penalty:2
5/18/2012
8
IonTorrentDataAnalysis@MonsantoCo
9. MERGED BWA RESULTS: 25% INSERTION
RATE; 33% DELETION RATE; 85%MISTMATCH
Mapping Results
reads 23,277,245
mapped reads 21,124,134
mapped bases 3,622,712,040
perfectly mapped 3,143,328
len max 433
len mean 171
len stdev 82
mapq mean 95
mapq stdev 87
snp rate 4
ins rate 25
del rate 33
pct mismatch 85
base qual mean 22
base qual stdev 9
5/18/2012
9
IonTorrentDataAnalysis@MonsantoCo
10. MERGED BWA RESULTS: 91% READS MAPPED
Total Number of Reads: 23.3M
Number of Reads Mapped:21.1M
Percentage of Reads Mapped: 91%
5/18/2012
10
IonTorrentDataAnalysis@MonsantoCo
11. MERGED BWA RESULTS: BASE QUALITY
DECREASES FROM 100 BP
Mean Base Quality
5/18/2012
11
IonTorrentDataAnalysis@MonsantoCo
Quality
keeps on
dropping
after 100bp
12. MERGED BWA RESULTS: BASE QUALITY
DECREASES FROM 100 BP
Per Base Quality
5/18/2012
12
IonTorrentDataAnalysis@MonsantoCo
13. MERGED BWA RESULTS: LOW ERRORS AT THE
START; HIGH ERRORS AT THE END
Error Profiles:
The profiles indicate that the Mismatch, Insertion
and Deletion are really high and they tend to be low
at the start of the sequence and keep on increasing
gradually as the sequence gets longer.
5/18/2012
13
IonTorrentDataAnalysis@MonsantoCo
14. MERGED BWA RESULTS: HIGH MISMATCH,
HIGH INSERTION; HIGH DELETION
Error Profiles
5/18/2012
14
IonTorrentDataAnalysis@MonsantoCo
15. MERGED BWA RESULTS: HIGH MISMATCH,
HIGH INSERTION; HIGH DELETION
Error Profiles
5/18/2012
15
IonTorrentDataAnalysis@MonsantoCo
16. MERGED BWA RESULTS: OVER REPRESENTATION
BETWEEN 150-450 BP
K-mer Profile
There is over representation of K-mers from position
150 to 450.
5/18/2012
16
IonTorrentDataAnalysis@MonsantoCo
17. MERGED TMAP RESULTS : 28% INSERTION
34% DELETION; 88% MISMATCH
Mapping Results
reads 19,866,237
mapped reads 17,795,383
mapped bases 3,381,672,736
perfectly mapped 2,053,578
len max 433
len mean 190
len stdev 79
maq mean 14
maq stdev 10
snp rate 5%
ins rate 28%
del rate 34%
pct mismatch 88%
base qual mean 22
base qual stdev 9
5/18/2012
17
IonTorrentDataAnalysis@MonsantoCo
18. MERGED TMAP RESULTS: 90% READ MAPPED
Total Number of Reads: 17.8M
Number of Reads Mapped:19.9M
Percentage of Reads Mapped: 90%
5/18/2012
18
IonTorrentDataAnalysis@MonsantoCo
19. MERGED TMAP RESULTS: BASE QUALITY
DECREASES FROM 100 BP
Mean Base Quality
5/18/2012
19
IonTorrentDataAnalysis@MonsantoCo
Quality
keeps on
dropping
after 100
bp
20. MERGED TMAP RESULTS: BASE QUALITY
DECREASES FROM 100 BP
Per Base Quality
5/18/2012
20
IonTorrentDataAnalysis@MonsantoCo
21. MERGED TMAP RESULTS: HIGH MISMATCH,
HIGH INSERTION; HIGH DELETION
Error Profiles
5/18/2012
21
IonTorrentDataAnalysis@MonsantoCo
22. MERGED TMAP RESULTS: HIGH MISMATCH,
HIGH INSERTION; HIGH DELETION
Error Profiles
5/18/2012
22
IonTorrentDataAnalysis@MonsantoCo
23. MERGED TMAP RESULTS: OVER
REPRESENTATION BETWEEN 150-450 BP
K-mer Profile
There is over representation of K-mers from position
150 to 450.
5/18/2012
23
IonTorrentDataAnalysis@MonsantoCo
25. ION TORRENT INPUT DATA: 13BN BASE
PAIR, 72MN READS, 95X COVERAGE
Read Read Type
Read
Len
(Mean)
Total bases
Total
reads
Estimated
Coverage
(X)
Ion Torrent
ORG
100bp,
200bp
400bp chip
187 13,521,610,812 72,058,773 95
Ion Torrent
Corrected
100bp,
200bp
400bp chip
187 13,479,341,388
72,058,773
95
5/18/2012
25
IonTorrentDataAnalysis@MonsantoCo
26. ORG BWA RESULTS: 21% INSERTION; 27%
DELETION; 81% MISMATCH
CORRECTED BWA RESULTS: 10% INSERTION; 15%
DELETION; 70% MISMATCH
Corrected BWA Mapping Results
reads 79,986,695
mapped reads 75,639,986
mapped bases 14,695,848,107
perfectly mapped 23,006,719
len max 678
len mean 194
len stdev 83
mapq mean 100
mapq stdev 88
snp rate 2%
ins rate 10%
del rate 15%
pct mismatch 70%
base qual mean 20
base qual stdev 6
5/18/2012
26
IonTorrentDataAnalysis@MonsantoCo
ORG BWA Mapping Results
reads 80,098,562
mapped reads 71611630
mapped bases 10,456,280,566
perfectly mapped 13,729,260
len max 433
len mean 146
len stdev 66
mapq mean 97
mapq stdev 86
snp rate 3.2%
ins rate 21%
del rate 27%
pct mismatch 81%
base qual mean 21
base qual stdev 6
27. ORG BWA RESULTS: 89% READ MAPPED
CORRECTED BWA RESULTS: 95% READS MAPPED
5/18/2012
27
IonTorrentDataAnalysis@MonsantoCo
ORG BWA Mapping Results
Total Number of Reads 80.9M
Number of Reads Mapped 71.6M
Percentage of Reads Mapped 89%
Corrected BWA Mapping Results
Total Number of Reads 80.0M
Number of Reads Mapped 75.6M
Percentage of Reads Mapped 95%
28. CORRECTED BWA RESULTS: BASE QUALITY
DECREASE FROM 100
5/18/2012
28
IonTorrentDataAnalysis@MonsantoCo
Quality
keeps on
dropping
after 100
bp
29. CORRECTED BWA RESULTS : BASE QUALITY
DECREASES FROM 100 BP
Per Base Quality
5/18/2012
29
IonTorrentDataAnalysis@MonsantoCo
30. CORRECTED BWA RESULTS: HIGH MISMATCH; HIGH INSERTION;
HIGH DELETION; BUT 10% SMALLER THEN ORG READS
Error Profiles
5/18/2012
30
IonTorrentDataAnalysis@MonsantoCo
31. CORRECTED BWA RESULTS: HIGH MISMATCH; HIGH INSERTION;
HIGH DELETION; BUT 10% SMALLER THEN ORG READS
Error Profiles
5/18/2012
31
IonTorrentDataAnalysis@MonsantoCo
32. CORRECTED BWA RESULTS: OVER
REPRESENTATION BETWEEN 150-450 BP
K-mer Profile
There is over representation of K-mers from position
250 to 450.
5/18/2012
32
IonTorrentDataAnalysis@MonsantoCo
33. ORG TMAP RESULTS: 20% INSERTION; 23%
DELETION; 84% MISMATCH
CORRECTED TMAP RESULTS: 13% INSERTION;
18% DELETION; 74% MISMATCH
Corrected TMAP Mapping Results
reads 72,058,773
mapped reads 68,116,303
mapped bases 12,763,573,084
perfectly mapped 18,029,367
len max 678
len mean 187
len stdev 81
mapq mean 13
mapq stdev 10
snp rate 3
ins rate 13
del rate 18
pct mismatch 74
base qual mean 20
base qual stdev 6
5/18/2012
33
IonTorrentDataAnalysis@MonsantoCo
ORG TMAP Mapping Results
reads 72,058,773
mapped reads 65,224,903
mapped bases 12,211,168,843
perfectly mapped 10,436,368
len max 638
len mean 187
len stdev 81
mapq mean 14
mapq stdev 10
snp rate 3
ins rate 20
del rate 23
pct mismatch 84
base qual mean 20
base qual stdev 6
34. ORG TMAP RESULTS: 89% READ MAPPED
CORRECTED TMAP RESULTS: 95% READS
MAPPED
5/18/2012
34
IonTorrentDataAnalysis@MonsantoCo
ORG BWA Mapping Results
Total Number of Reads 72.1M
Number of Reads Mapped 65.2M
Percentage of Reads Mapped 91%
Corrected BWA Mapping Results
Total Number of Reads 72.1M
Number of Reads Mapped 68.1M
Percentage of Reads Mapped 95%
35. CORRECTED TMAP RESULTS: BASE QUALITY
DECREASE FROM 100
5/18/2012
35
IonTorrentDataAnalysis@MonsantoCo
Quality
keeps on
dropping
after 200
bp
36. CORRECTED TMAP RESULTS : BASE
QUALITY DECREASES FROM 100 BP
Per Base Quality
5/18/2012
36
IonTorrentDataAnalysis@MonsantoCo
37. CORRECTED TMAP RESULTS: HIGH MISMATCH; HIGH INSERTION;
HIGH DELETION; BUT 10% SMALLER THEN ORG READS
Error Profiles
5/18/2012
37
IonTorrentDataAnalysis@MonsantoCo
38. CORRECTED TMAP RESULTS: HIGH MISMATCH; HIGH INSERTION;
HIGH DELETION; BUT 10% SMALLER THEN ORG READS
Error Profiles
5/18/2012
38
IonTorrentDataAnalysis@MonsantoCo
39. CORRECTED TMAP RESULTS : OVER
REPRESENTATION BETWEEN 150-450 BP
K-mer Profile
There is over representation of K-mers from position
150 to 450.
5/18/2012
39
IonTorrentDataAnalysis@MonsantoCo
41. 400 BP READS: N50 421BP; MAX CONTIG 4.6KB;
TOTAL BASES 201MB
400 Bp Reads Assembly Stats
Number of contigs 51,7835
Total size of contigs 201,990,292
Longest contig 4,684
Shortest contig 23
Number of contigs > 1K nt 11,939 2.30%
Number of contigs > 10K nt 0 0.00%
Mean contig size 390
Median contig size 329
N50 contig length 421
5/18/2012
41
IonTorrentDataAnalysis@MonsantoCo
42. 400 BP READS: N50 426BP; MAX CONTIG 4.2KB;
TOTAL BASES 201MB
400 Bp Reads clipped at length 450 Assembly Stats
Number of contigs 509,308
Total size of contigs 201,527,141
Longest contig 4,272
Shortest contig 23
Number of contigs > 1K nt 13,781 2.70%
Number of contigs > 10K nt 0 0.00%
Mean contig size 396
Median contig size 331
N50 contig length 426
5/18/2012
42
IonTorrentDataAnalysis@MonsantoCo
• Reads Clipped at length 450
43. 400 BP READS: N50 430BP; MAX CONTIG 5.4KB;
TOTAL BASES 192MB
400 Bp Reads clipped at length 450 qual 15 Assembly Stats
Number of contigs 478,037
Total size of contigs 192,109,210
Longest contig 5,378
Shortest contig 23
Number of contigs > 1K nt 16,737 3.50%
Number of contigs > 10K nt 0 0.00%
Mean contig size 402
Median contig size 324
N50 contig length 430
5/18/2012
43
IonTorrentDataAnalysis@MonsantoCo
• Reads Clipped at length 450 with minimum quality of 15
44. ORG READS: N50 397BP; MAX CONTIG 5KB;
TOTAL BASES 185MB
Org Reads Assembly Stats
Number of contigs 486,255
Total size of contigs 185,584,458
Longest contig 5,878
Shortest contig 24
Number of contigs > 1K nt 15,386 3.20%
Number of contigs > 10K nt 0 0.00%
Mean contig size 382
Median contig size 299
N50 contig length 397
5/18/2012
44
IonTorrentDataAnalysis@MonsantoCo
45. ERROR CORRECTED READS: N50 550BP;
MAX CONTIG 28KB; TOTAL BASES 203MB
Error Corrected Reads Assembly Stats
Number of contigs 424,264
Total size of contigs 203,921,151
Longest contig 28,009
Shortest contig 24
Number of contigs > 1K nt 33,025 7.80%
Number of contigs > 10K nt 43 0.00%
Mean contig size 481
Median contig size 328
N50 contig length 550
5/18/2012
45
IonTorrentDataAnalysis@MonsantoCo