SlideShare una empresa de Scribd logo
1 de 32
Validating and improving the
D. melanogaster reference genome sequence
using PacBio de novo assemblies
Casey M. Bergman
@bergmanlab
@caseybergman
University of Liverpool
Centre for Genomic Research
PacBio Symposium
4 April 2014
!
Credits
• Danny Miller (Stowers Institute)
• Jane Landolin, Kristi Kim, Jason Chin & Edwin Hauw
(Pacific Biosciences)
• Sue Celniker & Roger Hoskins (Berkeley Drosophila
Genome Project)
• Sergey Koren & Adam Phillippy (National Biodefense
Analysis and Countermeasures Center)
• Raquel Linheiro (University of Manchester)
Bridges (1916) PMID: 17245850
“The” Drosophila genome circa 1910
“The” Drosophila genome circa 1925
Morgan et al. (1925) The Genetics of Drosophila
Painter (1933) PMID: 17801695
“The” Drosophila genome circa 1940
The strategy we have used is called chromosomal walking and jumping; it is
shown diagrammatically in Figure 1. The chromosomal origin of any non-repeated
segment of D. melanogaster DNA (Dm segment) can be determined by in situ
hybridization of that DNA to polytene chromosomes. When the sites of
hybridization are visualized by tritium autoradiography, the position is usually
confined to one or a few bands, which is similar to the precision of the cytological
localizations of rearrangement breakpoints or the localizations of well-mapped
genes. If a DNA sequence is found within a few bands of a gene of interest, that
sequence can be used as the starting point for a chromosomal walk to the gene. A
"step" in the walking procedure involves screening a recombinant DNA library of
random large Dm segments to collect those that overlap the starting point. The
CIIBIDE~W:KI' F87B: I B I C I D I E I F888A I B IC~~FA 0
A Af  / 
- - START HERE
• T
ill
T •
LEFT FUSION FRAGMENT RIGHT FUSION FRABNENT
89 IBB 88
INVERSION INVERSION
BREAK BREAK
Fro. 1. The strategy for walking and jumping. The upper chromosome represents a portion of the
right arm of the third chromosome with normal cytology (drawn from the map of Bridges, 1941), and
the lower chromosome has an inversion of the region from 87E to 89E. A few steps of a chromosomal
walk are shown diagrammatically below the 87E region (not to scale with the chromosome). When the
walk reached the site of the inversion breakpoint, the DNA from that position could be used to
identify the two fusion fragments isolated from the inversion chromosome. The foreign DNA in the
fusion fragments (tandem circles) was homologous to normal chromosomal DNA at the right or distal
inversion breakpoint, and thus it served as the origin of a chromosomal walk in 89E.
e.g. Bender et al. (1983) PMID: 6410077
“The” Drosophila genome circa 1990
“The” Drosophila genome circa 2000
Adams et al. (2000) PMID: 10731132
Accuracy of whole genome shotgun (WGS)
assembly vs. BAC-based physical map
Myers et al. (2000) PMID: 10731133
peaks - discrepancies
green - gaps
purple - TEs
Myers et al. (2000) PMID: 10731133
Accuracy of WGS vs. BAC-based sequencing
“The” Drosophila genome since 2000
~ 120 Mb of euchromatin
~ 60-100 Mb heterochromatin
Release Date
Total size of
scaffolds
Total size of
contigs
Contigs Contig N50
1 Mar 2000 116,117,226 114,201,085 1427 220,490
2 Oct 2000 116,109,070 114,448,849 1103 318,193
3 Dec 2002 116,781,562 116,739,493 50 14,289,516
4 Apr 2004 118,357,599 118,348,386 28 18,203,742
5 Mar 2006 120,381,546 120,290,946 14 21,485,538
Euchromatic genome assemblies
Several gaps persist in euchromatic arms
~ 120 Mb of euchromatin
~ 60-100 Mb heterochromatin
“The” Drosophila genome since 2000
Hoskins et al. (2007) PMID: 17569867
Heterochromatic genome assemblies
~350 Kb
in Rel5
Release Scaffolds
Total Size of
Scaffolds
Contigs
Total Size of
Contigs
1 0 0 0 0
2 1 (U) 7,513,406 1000 5,530,718
3 2604 20,941,991 3810 17,150,417
4 0 0 0 0
5 8 (U + armHet + mt) 19,350,335 3044 16,535,110
Majority of heterochromatin unassembled
Heterochromatic genome assemblies
Low coverage pilot experiment with Hawley Lab
http://bergmanlab.smith.man.ac.uk/?p=1971
High coverage experiment with PacBio & BDGP
http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html
http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
Metric Value
Library Size (Kb) 15
Chemistry P5-C3
# SMRT cells 42
Run time (days) 6
# bases (nt) 15,208,567,933
# reads 1,514,730
avg length (nt) 10,040
N50 (nt) 14,214
Max (nt) 44,766
High coverage PacBio dataset for
D. melanogaster BDGP reference strain
http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html
http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
Reference-based long-read mapping with BLASR
http://bergmanlab.smith.man.ac.uk/?p=2176
>90x coverage based on reference mapping
http://bergmanlab.smith.man.ac.uk/?p=2176
PacBio-only assemblies of the
D. melanogaster genome
Assembly Read Set Pre-assembly Assembler Quivered
CA25x 25x longest PBcR CA 8.1 n
CA25x-Q 25x longest PBcR CA 8.1 y
CA50x 50x longest PBcR CA 8.1 n
FALCON-Q 25x longest FALCON FALCON y
FALCON-PBcR 70x PBcR FALCON n
FALCON-AWS all FALCON FALCON n
Koren & Phillippy (unpublished)
Chin & Bergman (unpublished)
Assembly Contigs Contig N50 (nt) Max Contig (nt)
CA25x 128 15,297,019 24,622,056
CA25x-Q 128 15,305,620 24,648,237
CA50x 131 4,105,199 24,577,947
FALCON-Q 434 5,001,041 21,336,512
FALCON-PBcR 1774 7,499,810 25,727,813
FALCON-AWS 955 7,882,002 21,631,108
PacBio-only assemblies of the
D. melanogaster genome
Long-range contiguity of CA25x assembly
Koren & Phillippy (unpublished)
http://cbcb.umd.edu/software/PBcR/dmel.html
X 3R 3L 2L 42R
Chin (unpublished)
https://github.com/PacificBiosciences/falcon
Long-range contiguity of FALCON-Q assembly
X3R 2R3L 2L
Base level accuracy of PacBio
D. melanogaster assemblies vs Release 5
0
6
12
18
24
30
CA25x CA25x-Q CA50x FALCON-Q FALCON-PBcR FALCON-AWS
0
60
120
180
240
300
mismatches/100kb
indels/100kb
Rel3
Rel3
Rel1
Rel1
Towards a $1000 genome assembly
using FALCON, StarCluster & AWS
Assembly
Pre-assembly
(CPU hours)
Assembly
(CPU hours)
CA25x 621,000 8,000
FALCON-AWS 1,500 48
Expert Novice
https://github.com/PacificBiosciences/FALCON/blob/v0.1.1/examples/Dmel_asm.md
https://github.com/cbergman/FALCON/blob/v0.1.1/examples/Dmel_asm.md
Euchromatic gap closure with PacBio contigs
Celniker (unpublished)
Gap at 64C
Gap at 57B
Identification of Y-chromosome contigs in
PacBio assemblies by female/male depth ratio
0 1 2 3
02468
Ratio Profile
Ratio (in 10000 bre
Counts(log)
02468
Ratio Profile
chr2L
chr2R
chr3L
chr3R
chr4
chrX
chrYHet
log10frequency
female/male depth ratio
bwa
short read
DNA-seq
female/male depth ratio
Linheiro & Bergman (unpublished)
0 10 20 30 40 50 60
01234
Ratio 0052_00|quiver|quiver
Location in chr (x10000)
Ratio
●●●●●●
●
●
●●●●●●●●●●●●●
●
●●
●
●●
●
●
●
●●●●●
●
●●●●
●
●●●
●
●●●
●●●●●
●●●
●
●
●●●
●
●
●
●
female/maledepthratio
window (10Kb step)
0 10 20 30 40 50 60
01234
Ratio 0052_00|quiver|quiver
Location in chr (x10000)
Ratio
●●●●●●
●
●
●●●●●●●●●●●●●
●
●●
●
●●
●
●
●
●●●●●
●
●●●●
●
●●●
●
●●●
●●●●●
●●●
●
●
●●●
●
●
●
●
●
●
●
_
_
A ratio
X ratio
Y ratio
X log 100 count
Y log 100 count
Identification of Y-chromosome contigs in
PacBio Assemblies by female/male depth ratio
Linheiro & Bergman (unpublished)
Improvement of the Y-chromosome
assembly & gene models
Celniker (unpublished)
Take Home (I)
• View of D. melanogaster genome has been changing
for >100 years & is still not complete
• Frontier of D. melanogaster genome assembly is in
heterochromatic regions (model for repeat-rich plant
genomes)
• PacBio long reads can be used to generate long-
range de novo assemblies that can close
euchromatic gaps & generate large heterochromatic
contigs
• Bioinformatic challenges: better pre-assembly
algorithms, better polishing algorithms, *.h5 data
archiving
• Early, open release of genomic data by small labs
can stimulate big returns & new collaborations
• PacBio has right corporate philosophy of engaging/
collaborating with the genomics community (open
data, open source)
Take Home (II)

Más contenido relacionado

Destacado

gallerirundan2_031211
gallerirundan2_031211gallerirundan2_031211
gallerirundan2_031211Johan Vestin
 
МойСклад: новые возможности 2016
МойСклад: новые возможности 2016МойСклад: новые возможности 2016
МойСклад: новые возможности 2016MoySklad
 
KOMPAS
KOMPASKOMPAS
KOMPAS5ife
 
Presentazione Chiara definitiva (2)
Presentazione Chiara definitiva (2)Presentazione Chiara definitiva (2)
Presentazione Chiara definitiva (2)Chiara D'ALOI
 
The Ultimate Leadership Development Experience Explores Key Issues
The Ultimate Leadership Development Experience Explores Key IssuesThe Ultimate Leadership Development Experience Explores Key Issues
The Ultimate Leadership Development Experience Explores Key IssuesJeff Finkle, CEcD
 
HANDICARE Monte-Escaliers - Courbe - Double rail
HANDICARE Monte-Escaliers - Courbe - Double railHANDICARE Monte-Escaliers - Courbe - Double rail
HANDICARE Monte-Escaliers - Courbe - Double railHANDICARE Monte-Escaliers
 

Destacado (11)

gallerirundan2_031211
gallerirundan2_031211gallerirundan2_031211
gallerirundan2_031211
 
МойСклад: новые возможности 2016
МойСклад: новые возможности 2016МойСклад: новые возможности 2016
МойСклад: новые возможности 2016
 
KOMPAS
KOMPASKOMPAS
KOMPAS
 
Vacature shine
Vacature shineVacature shine
Vacature shine
 
Belgien
BelgienBelgien
Belgien
 
Presentazione Chiara definitiva (2)
Presentazione Chiara definitiva (2)Presentazione Chiara definitiva (2)
Presentazione Chiara definitiva (2)
 
The Ultimate Leadership Development Experience Explores Key Issues
The Ultimate Leadership Development Experience Explores Key IssuesThe Ultimate Leadership Development Experience Explores Key Issues
The Ultimate Leadership Development Experience Explores Key Issues
 
Ashoka the great....
Ashoka the great....Ashoka the great....
Ashoka the great....
 
ITIL ServiceNow offerings
ITIL ServiceNow offeringsITIL ServiceNow offerings
ITIL ServiceNow offerings
 
Era7 bioinformatics and_the_microbiome_november_2016
Era7 bioinformatics and_the_microbiome_november_2016Era7 bioinformatics and_the_microbiome_november_2016
Era7 bioinformatics and_the_microbiome_november_2016
 
HANDICARE Monte-Escaliers - Courbe - Double rail
HANDICARE Monte-Escaliers - Courbe - Double railHANDICARE Monte-Escaliers - Courbe - Double rail
HANDICARE Monte-Escaliers - Courbe - Double rail
 

Similar a Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)Jonathan Blakes
 
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONsSYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONskundan Jadhao
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineFrancesca Giordano
 
Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization Leonid Mirny
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...GenomeInABottle
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen MigaKaren Hayden Miga
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfNoraCRuizGuevara
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf7006ASWATHIRR
 
Organizational Heterogeneity of Human Genome
Organizational Heterogeneity of Human GenomeOrganizational Heterogeneity of Human Genome
Organizational Heterogeneity of Human GenomeSvetlana Frenkel
 
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)GigaScience, BGI Hong Kong
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for educationaryajayakottarathil
 
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)William Chow
 

Similar a Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies (20)

Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)
 
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONsSYNTHETIC CHROMOSOME PLATFORMs IN PLANTS:  CONCEPTS & APPLICATIONs
SYNTHETIC CHROMOSOME PLATFORMs IN PLANTS: CONCEPTS & APPLICATIONs
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
Genome Assembly 2018
Genome Assembly 2018Genome Assembly 2018
Genome Assembly 2018
 
F Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis PipelineF Giordano ScanPAV Analysis Pipeline
F Giordano ScanPAV Analysis Pipeline
 
Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
How giab fits in the rest of the world human genome structural variation co...
How giab fits in the rest of the world   human genome structural variation co...How giab fits in the rest of the world   human genome structural variation co...
How giab fits in the rest of the world human genome structural variation co...
 
London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
 
CRISPR REPORT
CRISPR REPORTCRISPR REPORT
CRISPR REPORT
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Organizational Heterogeneity of Human Genome
Organizational Heterogeneity of Human GenomeOrganizational Heterogeneity of Human Genome
Organizational Heterogeneity of Human Genome
 
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
Zhipeng Li at #ICG12: Draft Genome of the Reindeer (Rangifer tarandus)
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
Genome10K & Genome Science gEVAL Talk (Earlham Institute/Norwich)
 

Último

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 

Último (20)

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 

Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies

  • 1. Validating and improving the D. melanogaster reference genome sequence using PacBio de novo assemblies Casey M. Bergman @bergmanlab @caseybergman University of Liverpool Centre for Genomic Research PacBio Symposium 4 April 2014 !
  • 2. Credits • Danny Miller (Stowers Institute) • Jane Landolin, Kristi Kim, Jason Chin & Edwin Hauw (Pacific Biosciences) • Sue Celniker & Roger Hoskins (Berkeley Drosophila Genome Project) • Sergey Koren & Adam Phillippy (National Biodefense Analysis and Countermeasures Center) • Raquel Linheiro (University of Manchester)
  • 3. Bridges (1916) PMID: 17245850 “The” Drosophila genome circa 1910
  • 4. “The” Drosophila genome circa 1925 Morgan et al. (1925) The Genetics of Drosophila
  • 5. Painter (1933) PMID: 17801695 “The” Drosophila genome circa 1940
  • 6. The strategy we have used is called chromosomal walking and jumping; it is shown diagrammatically in Figure 1. The chromosomal origin of any non-repeated segment of D. melanogaster DNA (Dm segment) can be determined by in situ hybridization of that DNA to polytene chromosomes. When the sites of hybridization are visualized by tritium autoradiography, the position is usually confined to one or a few bands, which is similar to the precision of the cytological localizations of rearrangement breakpoints or the localizations of well-mapped genes. If a DNA sequence is found within a few bands of a gene of interest, that sequence can be used as the starting point for a chromosomal walk to the gene. A "step" in the walking procedure involves screening a recombinant DNA library of random large Dm segments to collect those that overlap the starting point. The CIIBIDE~W:KI' F87B: I B I C I D I E I F888A I B IC~~FA 0 A Af / - - START HERE • T ill T • LEFT FUSION FRAGMENT RIGHT FUSION FRABNENT 89 IBB 88 INVERSION INVERSION BREAK BREAK Fro. 1. The strategy for walking and jumping. The upper chromosome represents a portion of the right arm of the third chromosome with normal cytology (drawn from the map of Bridges, 1941), and the lower chromosome has an inversion of the region from 87E to 89E. A few steps of a chromosomal walk are shown diagrammatically below the 87E region (not to scale with the chromosome). When the walk reached the site of the inversion breakpoint, the DNA from that position could be used to identify the two fusion fragments isolated from the inversion chromosome. The foreign DNA in the fusion fragments (tandem circles) was homologous to normal chromosomal DNA at the right or distal inversion breakpoint, and thus it served as the origin of a chromosomal walk in 89E. e.g. Bender et al. (1983) PMID: 6410077 “The” Drosophila genome circa 1990
  • 7.
  • 8. “The” Drosophila genome circa 2000 Adams et al. (2000) PMID: 10731132
  • 9. Accuracy of whole genome shotgun (WGS) assembly vs. BAC-based physical map Myers et al. (2000) PMID: 10731133
  • 10. peaks - discrepancies green - gaps purple - TEs Myers et al. (2000) PMID: 10731133 Accuracy of WGS vs. BAC-based sequencing
  • 11. “The” Drosophila genome since 2000 ~ 120 Mb of euchromatin ~ 60-100 Mb heterochromatin
  • 12. Release Date Total size of scaffolds Total size of contigs Contigs Contig N50 1 Mar 2000 116,117,226 114,201,085 1427 220,490 2 Oct 2000 116,109,070 114,448,849 1103 318,193 3 Dec 2002 116,781,562 116,739,493 50 14,289,516 4 Apr 2004 118,357,599 118,348,386 28 18,203,742 5 Mar 2006 120,381,546 120,290,946 14 21,485,538 Euchromatic genome assemblies Several gaps persist in euchromatic arms
  • 13. ~ 120 Mb of euchromatin ~ 60-100 Mb heterochromatin “The” Drosophila genome since 2000
  • 14. Hoskins et al. (2007) PMID: 17569867 Heterochromatic genome assemblies ~350 Kb in Rel5
  • 15. Release Scaffolds Total Size of Scaffolds Contigs Total Size of Contigs 1 0 0 0 0 2 1 (U) 7,513,406 1000 5,530,718 3 2604 20,941,991 3810 17,150,417 4 0 0 0 0 5 8 (U + armHet + mt) 19,350,335 3044 16,535,110 Majority of heterochromatin unassembled Heterochromatic genome assemblies
  • 16. Low coverage pilot experiment with Hawley Lab http://bergmanlab.smith.man.ac.uk/?p=1971
  • 17. High coverage experiment with PacBio & BDGP http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
  • 18. Metric Value Library Size (Kb) 15 Chemistry P5-C3 # SMRT cells 42 Run time (days) 6 # bases (nt) 15,208,567,933 # reads 1,514,730 avg length (nt) 10,040 N50 (nt) 14,214 Max (nt) 44,766 High coverage PacBio dataset for D. melanogaster BDGP reference strain http://blog.pacificbiosciences.com/2014/01/data-release-preliminary-de-novo.html http://www.ncbi.nlm.nih.gov/sra/?term=SRP040522
  • 19. Reference-based long-read mapping with BLASR http://bergmanlab.smith.man.ac.uk/?p=2176
  • 20. >90x coverage based on reference mapping http://bergmanlab.smith.man.ac.uk/?p=2176
  • 21. PacBio-only assemblies of the D. melanogaster genome Assembly Read Set Pre-assembly Assembler Quivered CA25x 25x longest PBcR CA 8.1 n CA25x-Q 25x longest PBcR CA 8.1 y CA50x 50x longest PBcR CA 8.1 n FALCON-Q 25x longest FALCON FALCON y FALCON-PBcR 70x PBcR FALCON n FALCON-AWS all FALCON FALCON n Koren & Phillippy (unpublished) Chin & Bergman (unpublished)
  • 22. Assembly Contigs Contig N50 (nt) Max Contig (nt) CA25x 128 15,297,019 24,622,056 CA25x-Q 128 15,305,620 24,648,237 CA50x 131 4,105,199 24,577,947 FALCON-Q 434 5,001,041 21,336,512 FALCON-PBcR 1774 7,499,810 25,727,813 FALCON-AWS 955 7,882,002 21,631,108 PacBio-only assemblies of the D. melanogaster genome
  • 23. Long-range contiguity of CA25x assembly Koren & Phillippy (unpublished) http://cbcb.umd.edu/software/PBcR/dmel.html X 3R 3L 2L 42R
  • 25. Base level accuracy of PacBio D. melanogaster assemblies vs Release 5 0 6 12 18 24 30 CA25x CA25x-Q CA50x FALCON-Q FALCON-PBcR FALCON-AWS 0 60 120 180 240 300 mismatches/100kb indels/100kb Rel3 Rel3 Rel1 Rel1
  • 26. Towards a $1000 genome assembly using FALCON, StarCluster & AWS Assembly Pre-assembly (CPU hours) Assembly (CPU hours) CA25x 621,000 8,000 FALCON-AWS 1,500 48 Expert Novice https://github.com/PacificBiosciences/FALCON/blob/v0.1.1/examples/Dmel_asm.md https://github.com/cbergman/FALCON/blob/v0.1.1/examples/Dmel_asm.md
  • 27. Euchromatic gap closure with PacBio contigs Celniker (unpublished) Gap at 64C Gap at 57B
  • 28. Identification of Y-chromosome contigs in PacBio assemblies by female/male depth ratio 0 1 2 3 02468 Ratio Profile Ratio (in 10000 bre Counts(log) 02468 Ratio Profile chr2L chr2R chr3L chr3R chr4 chrX chrYHet log10frequency female/male depth ratio bwa short read DNA-seq female/male depth ratio Linheiro & Bergman (unpublished)
  • 29. 0 10 20 30 40 50 60 01234 Ratio 0052_00|quiver|quiver Location in chr (x10000) Ratio ●●●●●● ● ● ●●●●●●●●●●●●● ● ●● ● ●● ● ● ● ●●●●● ● ●●●● ● ●●● ● ●●● ●●●●● ●●● ● ● ●●● ● ● ● ● female/maledepthratio window (10Kb step) 0 10 20 30 40 50 60 01234 Ratio 0052_00|quiver|quiver Location in chr (x10000) Ratio ●●●●●● ● ● ●●●●●●●●●●●●● ● ●● ● ●● ● ● ● ●●●●● ● ●●●● ● ●●● ● ●●● ●●●●● ●●● ● ● ●●● ● ● ● ● ● ● ● _ _ A ratio X ratio Y ratio X log 100 count Y log 100 count Identification of Y-chromosome contigs in PacBio Assemblies by female/male depth ratio Linheiro & Bergman (unpublished)
  • 30. Improvement of the Y-chromosome assembly & gene models Celniker (unpublished)
  • 31. Take Home (I) • View of D. melanogaster genome has been changing for >100 years & is still not complete • Frontier of D. melanogaster genome assembly is in heterochromatic regions (model for repeat-rich plant genomes) • PacBio long reads can be used to generate long- range de novo assemblies that can close euchromatic gaps & generate large heterochromatic contigs • Bioinformatic challenges: better pre-assembly algorithms, better polishing algorithms, *.h5 data archiving
  • 32. • Early, open release of genomic data by small labs can stimulate big returns & new collaborations • PacBio has right corporate philosophy of engaging/ collaborating with the genomics community (open data, open source) Take Home (II)