SlideShare a Scribd company logo
1 of 16
Download to read offline
The first near-complete assembly of
the hexaploid bread wheat genome,
Tritricum aestivum
Daniela Puiu
Aleksey Zimin, Richard Hall, Sarah Kingan, Bernardo Clavijo, Steven Salzberg
ICG-12
Oct 27 2017
IGC-12The Wheat Genome 2
Sequencing and Assembly of the
Ancestral and Common Wheat
Aegilops tauschii ssp strangulata accession AL8/78
Chinese spring variety (CS42, accession Dv418)
2013-2017
IGC-12The Wheat Genome 3
History of Wheat
~8,000 years ago: spontaneous hybridization
Emmer Wheat + Goat grass = Bread Wheat (World's 3rd
cereal crop)
Triticum turgidum + Aegilops tauschii = Triticum aestivum
AABB + DD = AABBDD
Whole Genome => Assisted Breeding => Improved Yield
IGC-12The Wheat Genome 4
The Wheat Genome
One of the most complex genomes !
1) Genome size: over 15 billion bases
2) Allohexapoild : six copies of each chromosome
3) >90% repeats
Multiple past attempts to assemble =>
assemblies shorter than the estimated genome size.
IGC-12The Wheat Genome 5
New vs Previous Assemblies
Tritricum 3.1
N50
232K
IGC-12The Wheat Genome 6
Data Reduction
Original Reads Number Sum Coverage Accuracy
Illumina 7.06G 1Tb 65x 99.5%
PacBio 55.5M 545Gb 36x 87.5%
Processed Seq Number Sum Coverage Accuracy
super-reads 95.7M 31Gb 2x 99.95%
mega-reads 57M 278Gb 18x 99.65%
MaSuRCA mega-reads
hybrid correction
IGC-12The Wheat Genome 7
MaSuRCA mega-reads Correction
IGC-12The Wheat Genome 8
Assembly Pipeline
MaSuRCA Correction
Illumina
Celera WGS Assembler
Mega-reads
Remove Duplicates
Tritricum 1.0
Tritricum 2.0
FALCON Correction
PacBio
FALCON Assembler
pReads
Arrow Polishing
FALCON Trit 0.5
FALCON Trit 1.0
k-mer Analysis
Merge
Tritricum 3.1
IGC-12The Wheat Genome 9
k-mer Analysis
50M
k-mers missing from the
PacBio assembly only
40M
30M
20M
10M
31-mer frequencies
IGC-12The Wheat Genome 10
Assembly Merge
Merging of the Hybrid and PacBio assembliesMerging of the Hybrid and PacBio assemblies
Tritricum 2.0 contig
FALCON contigA FALCON contigB
Tritricum 3.1
>5Kb >5Kb>5Kb
IGC-12The Wheat Genome 11
Assembly Statistics
Assembly Number Total size
(bp)
N50 size
(bp)
Triticum 2.0 375,328 14,395,027,822 75,599
FALCON Trit 1.0 97,809 12,939,100,857 215,314
Triticum 3.1 279,439 15,344,693,583 232,659
IGC-12The Wheat Genome 12
Run Time: 100 CPU years
Main
Steps
Run
Time
CPUhrs
Wall
Time
Months
MaSuRCA 100K 1.5
Celera WGS 470K 5
FALCON 150K 0.75
ARROW 160K 0.75
total 880K 9
100K CPU hrs=11.5 years
800K CPU hrs=100 years
IGC-12The Wheat Genome 13
Genome Repetitiveness
k-mer uniqueness ratios
WHEAT
FLY
COW
RICE
PINE
Ae tauschii
IGC-12The Wheat Genome 14
Publication
IGC-12The Wheat Genome 15
Conclusions
The most challenging genome (we) assembled!
Learning experience!
Assembly quality vs computational resources?
Share your data!
The most challenging genome (we) assembled!
Learning experience!
Assembly quality vs computational resources?
Share your data!
IGC-12The Wheat Genome 16
Acknowledgements
Steven Salzberg
Aleksey ZImin
Johns Hopkins University UCDavis Plant Sciences
Jan Dvorak
Earlham Institute
Bernardo Clavijo
Mingcheng Luo

More Related Content

Similar to Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum

2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
CGIAR Generation Challenge Programme
 
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
CGIAR Generation Challenge Programme
 
THEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybeanTHEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybean
ICARDA
 

Similar to Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum (12)

2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...2013 GRM: Improve chickpea productivity for marginal environments in  sub-Sah...
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
 
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
Research Program Genetic Gains (RPGG) Review Meeting 2021: From Discovery to ...
 
CRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxCRYOPRESERVATION.pptx
CRYOPRESERVATION.pptx
 
CRYOPRESERVATION.pptx
CRYOPRESERVATION.pptxCRYOPRESERVATION.pptx
CRYOPRESERVATION.pptx
 
CRISPR Is On The Move: Genome Editing From Rice To Wheat
CRISPR Is On The Move: Genome Editing From Rice To WheatCRISPR Is On The Move: Genome Editing From Rice To Wheat
CRISPR Is On The Move: Genome Editing From Rice To Wheat
 
Hybrid seed production of pigeonpea
Hybrid seed production of pigeonpea Hybrid seed production of pigeonpea
Hybrid seed production of pigeonpea
 
Establishment of an in vitro propagation and transformation system of Balani...
Establishment of an in vitro propagation  and transformation system of Balani...Establishment of an in vitro propagation  and transformation system of Balani...
Establishment of an in vitro propagation and transformation system of Balani...
 
Irc 2011-sm
Irc 2011-smIrc 2011-sm
Irc 2011-sm
 
Tropical maize genome: what do we know so far and how to use that information
Tropical maize genome: what do we know so far and how to use that informationTropical maize genome: what do we know so far and how to use that information
Tropical maize genome: what do we know so far and how to use that information
 
PFO_SBI_2015
PFO_SBI_2015PFO_SBI_2015
PFO_SBI_2015
 
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
GRM 2013: Delivering drought tolerance to those who need it: From genetic res...
 
THEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybeanTHEME – 4 Genomic diversity of domestication in soybean
THEME – 4 Genomic diversity of domestication in soybean
 

More from GigaScience, BGI Hong Kong

More from GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Recently uploaded

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Recently uploaded (20)

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 

Daniela Puiu at #ICG12: The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum

  • 1. The first near-complete assembly of the hexaploid bread wheat genome, Tritricum aestivum Daniela Puiu Aleksey Zimin, Richard Hall, Sarah Kingan, Bernardo Clavijo, Steven Salzberg ICG-12 Oct 27 2017
  • 2. IGC-12The Wheat Genome 2 Sequencing and Assembly of the Ancestral and Common Wheat Aegilops tauschii ssp strangulata accession AL8/78 Chinese spring variety (CS42, accession Dv418) 2013-2017
  • 3. IGC-12The Wheat Genome 3 History of Wheat ~8,000 years ago: spontaneous hybridization Emmer Wheat + Goat grass = Bread Wheat (World's 3rd cereal crop) Triticum turgidum + Aegilops tauschii = Triticum aestivum AABB + DD = AABBDD Whole Genome => Assisted Breeding => Improved Yield
  • 4. IGC-12The Wheat Genome 4 The Wheat Genome One of the most complex genomes ! 1) Genome size: over 15 billion bases 2) Allohexapoild : six copies of each chromosome 3) >90% repeats Multiple past attempts to assemble => assemblies shorter than the estimated genome size.
  • 5. IGC-12The Wheat Genome 5 New vs Previous Assemblies Tritricum 3.1 N50 232K
  • 6. IGC-12The Wheat Genome 6 Data Reduction Original Reads Number Sum Coverage Accuracy Illumina 7.06G 1Tb 65x 99.5% PacBio 55.5M 545Gb 36x 87.5% Processed Seq Number Sum Coverage Accuracy super-reads 95.7M 31Gb 2x 99.95% mega-reads 57M 278Gb 18x 99.65% MaSuRCA mega-reads hybrid correction
  • 7. IGC-12The Wheat Genome 7 MaSuRCA mega-reads Correction
  • 8. IGC-12The Wheat Genome 8 Assembly Pipeline MaSuRCA Correction Illumina Celera WGS Assembler Mega-reads Remove Duplicates Tritricum 1.0 Tritricum 2.0 FALCON Correction PacBio FALCON Assembler pReads Arrow Polishing FALCON Trit 0.5 FALCON Trit 1.0 k-mer Analysis Merge Tritricum 3.1
  • 9. IGC-12The Wheat Genome 9 k-mer Analysis 50M k-mers missing from the PacBio assembly only 40M 30M 20M 10M 31-mer frequencies
  • 10. IGC-12The Wheat Genome 10 Assembly Merge Merging of the Hybrid and PacBio assembliesMerging of the Hybrid and PacBio assemblies Tritricum 2.0 contig FALCON contigA FALCON contigB Tritricum 3.1 >5Kb >5Kb>5Kb
  • 11. IGC-12The Wheat Genome 11 Assembly Statistics Assembly Number Total size (bp) N50 size (bp) Triticum 2.0 375,328 14,395,027,822 75,599 FALCON Trit 1.0 97,809 12,939,100,857 215,314 Triticum 3.1 279,439 15,344,693,583 232,659
  • 12. IGC-12The Wheat Genome 12 Run Time: 100 CPU years Main Steps Run Time CPUhrs Wall Time Months MaSuRCA 100K 1.5 Celera WGS 470K 5 FALCON 150K 0.75 ARROW 160K 0.75 total 880K 9 100K CPU hrs=11.5 years 800K CPU hrs=100 years
  • 13. IGC-12The Wheat Genome 13 Genome Repetitiveness k-mer uniqueness ratios WHEAT FLY COW RICE PINE Ae tauschii
  • 14. IGC-12The Wheat Genome 14 Publication
  • 15. IGC-12The Wheat Genome 15 Conclusions The most challenging genome (we) assembled! Learning experience! Assembly quality vs computational resources? Share your data! The most challenging genome (we) assembled! Learning experience! Assembly quality vs computational resources? Share your data!
  • 16. IGC-12The Wheat Genome 16 Acknowledgements Steven Salzberg Aleksey ZImin Johns Hopkins University UCDavis Plant Sciences Jan Dvorak Earlham Institute Bernardo Clavijo Mingcheng Luo