SlideShare una empresa de Scribd logo
1 de 56
The best of both worlds
Combining PacBio with short read technology
  for improved de novo genome assembly

          Lex Nederbragt, NSC and CEES
           lex.nederbragt@bio.uio.no
This talk
Why does everybody want longer reads?


        … for genome assemblies
What is a genome assembly


    Hierarchical structure

reads

 contigs

   scaffolds
Sequence data

                           Reads
                                                    reads

                                                      contigs

                                                        scaffolds



original DNA

 fragments




original DNA

 fragments

                  Sequenced ends




               http://www.cbcb.umd.edu/research/assembly_primer.shtml
Contigs

                          Building contigs
                                                               reads

                                                                 contigs

                                                                   scaffolds


                 ACGCGATTCAGGTTACCACG
                   GCGATTCAGGTTACCACGCG
                     GATTCAGGTTACCACGCGTA
                       TTCAGGTTACCACGCGTAGC
                         CAGGTTACCACGCGTAGCGC
  Aligned reads            GGTTACCACGCGTAGCGCAT
                             TTACCACGCGTAGCGCATTA
                                ACCACGCGTAGCGCATTACA
                                  CACGCGTAGCGCATTACACA
                                    CGCGTAGCGCATTACACAGA
                                      CGTAGCGCATTACACAGATT
                                        TAGCGCATTACACAGATTAG
Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG
Contigs

                          Building contigs
                                                                     reads

                                                                       contigs

                                                                         scaffolds




     Repeat copy 1                                    Repeat copy 2




                                                          Contig orientation?
                                                            Contig order?




Collapsed repeat
   consensus
                     http://www.cbcb.umd.edu/research/assembly_primer.shtml
Mate pairs

                          Other read type
                                                      reads

                                                        contigs

                                                          scaffolds




     Repeat copy 1                      Repeat copy 2




(much) longer fragments
                                            mate pair reads
Scaffolds

                 Ordered, oriented contigs
                                               reads

                                                 contigs

                                                   scaffolds




    mate pairs
contigs



                           gap size estimate
What is a genome assembly


    Hierarchical structure

reads                            ACGCGATTCAGGTTACCACG
                                   GCGATTCAGGTTACCACGCG
                                     GATTCAGGTTACCACGCGTA
                                       TTCAGGTTACCACGCGTAGC
                                         CAGGTTACCACGCGTAGCGC
                  Aligned reads            GGTTACCACGCGTAGCGCAT
                                             TTACCACGCGTAGCGCATTA
                                                ACCACGCGTAGCGCATTACA
                                                  CACGCGTAGCGCATTACACA
                                                    CGCGTAGCGCATTACACAGA

 contigs                                              CGTAGCGCATTACACAGATT
                                                        TAGCGCATTACACAGATTAG
                Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG




   scaffolds
Genome assembly




So, what’s so hard about it?
1) Repeats

                                                                     reads

                                                                       contigs

                                                                         scaffolds




     Repeat copy 1                                    Repeat copy 2




                                         Repeats break up contigs


Collapsed repeat
   consensus
                     http://www.cbcb.umd.edu/research/assembly_primer.shtml
2) Heterozygosity



                                                               Differences
                                                              between sister
                                                          *   chromosomes



                                                          *




                                                          *




http://commons.wikimedia.org/wiki/File:Chromosome_1.svg
2) Heterozygosity




             Polymorphic contig 2

Contig 1                            Contig 4
             Polymorphic contig 3
2) Heterozygosity




http://www.astraean.com/borderwars/wp-content/uploads/2012/04/heterozygoats.jpg
and many other sites
3) Many programs to choose from




Zhang et al. (2011) doi:10.1371/journal.pone.0017915.g001
Assembly: challenges
         Repeat copy 1                               Repeat copy 2




                         Knowing how to use the programs



Heterozygosity
                              Polymorphic contig 2

          Contig 1                                            Contig 4
                              Polymorphic contig 3
So, why does everybody want longer reads?




http://www.autobizz.com.my/forum/forum/General-Chat/944-The-worlds-longest-car.html
Longer reads?
Repeat copy 1                                 Repeat copy 2




    Long reads can span repeats and heterozygous regions




                       Polymorphic contig 2

 Contig 1                                              Contig 4
                       Polymorphic contig 3
PacBio to the rescue?
High-throughput sequencing

                           Library preparation

SMRTBell'template'
SMRTBell'template'




Standard'Sequencing'
Standard'Sequencing'

                                           Generates& pass& each&
                                                    one&  on&   molecule&
           Insert&
      Large&     Sizes&                    Generates& pass& each&
                                                    one&  on&   molecule&
     Large Insert& Sizes
      Large&     Sizes&
            Insert                         sequenced&
                                            Single pass
                                           sequenced&


Circular'Consensus'Sequencing'
Circular'Consensus'Sequencing'                               Continued generations
                                                             of reads

  Small Insert Sizes&
   Small&
   Small&
         Insert&
               Sizes
         Insert&
               Sizes&

                                           Multiple mul8ple&
                                                    passes passes& each&
                                           Generates&            on&   molecule&
                                           Generates&
                                                    mul8ple&
                                           sequenced&      passes& each&
                                                                 on&   molecule&
                                           sequenced&
High-throughput sequencing

      Raw read length
High-throughput sequencing
SMRTBell'template'

                           Raw reads and subreads

Standard'Sequencing'


                                            Generates& pass& each&
                                                     one&  on&   molecule&
     Large Insert& Sizes
      Large&     Sizes&
            Insert                           Single pass
                                            sequenced&


                                           ‘Subreads’
Circular'Consensus'Sequencing'



  Small Insert Sizes&
   Small&Insert&
               Sizes

                                            Multiple mul8ple&
                                                     passes passes& each&
                                            Generates&            on&   molecule&
                                            sequenced&
PacBio: uses
SMRTBell'template'

                           Long reads  low quality

Standard'Sequencing'


                                             Generates& pass& each&
                                                      one&  on&   molecule&
     Large Insert& Sizes
      Large&     Sizes&
            Insert                            Single pass
                                             sequenced&
                                               85-87% accuracy
Circular'Consensus'Sequencing'
                             Useful for assembly?
    Small&
         Insert&
               Sizes&


                                             Generates&
                                                      mul8ple&
                                                             passes& each&
                                                                   on&   molecule&
                                             sequenced&
Solutions for assembly
Solutions for assembly (1)




   Designed by Pacific Biosciences




http://www.clker.com/clipart-4245.html
Solutions for assembly (2)
   Broad Institute




Need a special recipe
  for sequencing
Solutions for assembly (3)

                 PacBioToCA
        Error correct with short reads




Celera assembler


   http://schatzlab.cshl.edu/presentations/2012-01-17.PAG.SMRTassembly.pdf
PacBioToCA




             Koren et al, 2012
Shameless self-promotion

flxlexblog.wordpress.com
Shameless self-promotion




            @lexnederbragt
The Atlantic cod genome project
First draft




Fragmented assembly
    - short contigs
    - many gap bases
                                http://en.wikipedia.org
First draft



6467 scaffolds




                   35% gap bases
The causes




Short Tandem Repeats (>20% of gaps)
The causes


           Heterozygosity?



            Polymorphic contig 2

Contig 1                           Contig 4
            Polymorphic contig 3
The goal



 23 pseudochromosomes




       Longer contigs




                        Below 5% gap bases



PacBio to the rescue?
The approach
 SMRTBell'template'


         Libraries

 Standard'Sequencing'


                                  Generates& pass& ea
                                           one&  on&
      Large Insert& Sizes
       Large&     Sizes&
             Insert               sequenced&


Aim for looooong insert sizes
 Circular'Consensus'Sequencing'


     Small&
          Insert&
                Sizes&


                                  Generates&
                                           mul8ple&
                                                  passes
                                  sequenced&
SMRTBell'template'        The approach

                                  Sequencing
      Standard'Sequencing'


                                                Generates& pass& each&
                                                         one&  on&   molecule&
          Large Insert& Sizes
           Large&     Sizes&
                 Insert                          Single pass
                                                sequenced&


    Sequence with 90 minute movies
     Circular'Consensus'Sequencing'


         Small&
              Insert&
                    Sizes&


                                                Generates&
                                                         mul8ple&
                                                                passes& each&
                                                                      on&   molecule&
10 x coverage in reads of at least 3000 bp      sequenced&




                No, we don’t throw this away…
The approach

Error-correction
PacBio results
                               100          Relative throughput at different minimum length cutoffs


                                                                                                      10kb lib 2
                                                                    Fraction of bases at minimum 10kb lib 1
                                                                                                 length
                                                                                                      4kb lib
                               80
Percentage of total sequence

                               60
                               40
                               20
                               0




                                     0kbp   3kbp      5kbp                                   10kbp              15kbp

                                                             Length cutoff longest subread


                                               Large library insert size important!
chnology

                                        PacBio results




              SMRTBell'template'
                 64 SMRT Cells
                                                    3.2 Gigabytes in raw reads at least 3kb
                                                                3.8 x coverage
                                                3




              Standard'Sequencing'


                                                        Generates& pass& each&
                                                                 one&  on&   molecule&
                  Large Insert& Sizes
                   Large&     Sizes&
                         Insert                         sequenced&


      2.2 Gigabytes in longest subreads reads
             Circular'Consensus'Sequencing'
                   Largest 15 kbp

                 Small&
                      Insert&
                            Sizes&


                                                        Generates&
                                                                 mul8ple&
                                                                        passes& each&
                                                                              on&   molecule&
PacBio results

Mapping to the cod genome
      11.4 kbp subread




       10.6 kbp subread




      10.9 kbp subread
Example 1


ACACAC repeat




232 bp Gap




TGTGTG repeat
Example 1
Example 1
Example 1
Scaffold               ...ACACAC     TGTGTG...

PacBio reads
               Unplaced contig
Example 2


TGTGTG repeat




     344 bp Gap
Example 2
Example 2

Scaffold       ...TGTGTG

PacBio reads

                     Heterozygosity?
Example 3

Scaffold


   PacBio reads
                  300 bp misassembly?
Error-correction




                          Work In Progress
http://openclipart.org/
Outlook




  Will PacBio solve our problems?
Outlook




  Or
Outlook



                Polymorphic contig 2

Contig 1                               Contig 4
                Polymorphic contig 3




   Will we find the heterozygous regions?
Outlook




 http://www.pasteur.fr/recherche/unites/Bbi/
 en.wikipedia.org
 and Martin Malmstrøm

Más contenido relacionado

La actualidad más candente

Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationKiranKm11
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introductionSetia Pramana
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisUniversity of California, Davis
 
Role of Pangenomics for crop Improvement
Role of Pangenomics for crop ImprovementRole of Pangenomics for crop Improvement
Role of Pangenomics for crop ImprovementPatelSupriya
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Leighton Pritchard
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure PredictionSumin Byeon
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence AlignmentRavi Gandham
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvementRagavendran Abbai
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminarVarsha Gayatonde
 
Genome Editing Tool ZFNs and TALEs
Genome Editing Tool  ZFNs and TALEs Genome Editing Tool  ZFNs and TALEs
Genome Editing Tool ZFNs and TALEs Manita Paneri
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 
Crispr cas9 technology
Crispr cas9 technology Crispr cas9 technology
Crispr cas9 technology AshrafAlhamod
 

La actualidad más candente (20)

Association mapping
Association mappingAssociation mapping
Association mapping
 
Genome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome explorationGenome to pangenome : A doorway into crops genome exploration
Genome to pangenome : A doorway into crops genome exploration
 
Gene expression introduction
Gene expression introductionGene expression introduction
Gene expression introduction
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Role of Pangenomics for crop Improvement
Role of Pangenomics for crop ImprovementRole of Pangenomics for crop Improvement
Role of Pangenomics for crop Improvement
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure Prediction
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Msa
MsaMsa
Msa
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvement
 
Genome wide association studies seminar
Genome wide association studies seminarGenome wide association studies seminar
Genome wide association studies seminar
 
Genome Editing Tool ZFNs and TALEs
Genome Editing Tool  ZFNs and TALEs Genome Editing Tool  ZFNs and TALEs
Genome Editing Tool ZFNs and TALEs
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Ch06 rna
Ch06 rnaCh06 rna
Ch06 rna
 
Basics of association_mapping
Basics of association_mappingBasics of association_mapping
Basics of association_mapping
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
Crispr cas9 technology
Crispr cas9 technology Crispr cas9 technology
Crispr cas9 technology
 

Destacado

IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataAdrian Baez-Ortega
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applicationsAGRF_Ltd
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Lex Nederbragt
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...Anne Deslattes Mays
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioLex Nederbragt
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotLi Shen
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomicsMads Albertsen
 
Semiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant SciencesSemiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant SciencesThermo Fisher Scientific
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesScott Edmunds
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...QIAGEN
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 

Destacado (20)

IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
Updated: New High Throughput Sequencing technologies at the Norwegian Sequenc...
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Improving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBioImproving and validating the Atlantic Cod genome assembly using PacBio
Improving and validating the Atlantic Cod genome assembly using PacBio
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics[2013.10.29] albertsen genomics metagenomics
[2013.10.29] albertsen genomics metagenomics
 
Semiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant SciencesSemiconductor Sequencing Applications for Plant Sciences
Semiconductor Sequencing Applications for Plant Sciences
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
NGS Targeted Enrichment Technology in Cancer Research: NGS Tech Overview Webi...
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 

Similar a Combining PacBio with short read technology for improved de novo genome assembly

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshopc.titus.brown
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeLex Nederbragt
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copyPradeep Kumar
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly ProblemMark Chang
 

Similar a Combining PacBio with short read technology for improved de novo genome assembly (8)

2013 pag-equine-workshop
2013 pag-equine-workshop2013 pag-equine-workshop
2013 pag-equine-workshop
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Genome Assembly copy
Genome Assembly   copyGenome Assembly   copy
Genome Assembly copy
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
Git Going With DVCS v1.5.2
Git Going With DVCS v1.5.2Git Going With DVCS v1.5.2
Git Going With DVCS v1.5.2
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 

Más de Lex Nederbragt

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraLex Nederbragt
 
Why of version control
Why of version controlWhy of version control
Why of version controlLex Nederbragt
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and afterLex Nederbragt
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Lex Nederbragt
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...Lex Nederbragt
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...Lex Nederbragt
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use bloggingLex Nederbragt
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomesLex Nederbragt
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data Lex Nederbragt
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challengesLex Nederbragt
 

Más de Lex Nederbragt (10)

Coding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS eraCoding & Best Practice in Programming in the NGS era
Coding & Best Practice in Programming in the NGS era
 
Why of version control
Why of version controlWhy of version control
Why of version control
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and after
 
Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)? Repeat after me: Is our research reproducible (enough)?
Repeat after me: Is our research reproducible (enough)?
 
A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...A different kettle of fish entirely: bioinformatic challenges and solutions f...
A different kettle of fish entirely: bioinformatic challenges and solutions f...
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
How and why I use blogging
How and why I use bloggingHow and why I use blogging
How and why I use blogging
 
Assembly of metagenomes
Assembly of metagenomesAssembly of metagenomes
Assembly of metagenomes
 
NGS techniques and data
NGS techniques and data NGS techniques and data
NGS techniques and data
 
NGS: bioinformatic challenges
NGS: bioinformatic challengesNGS: bioinformatic challenges
NGS: bioinformatic challenges
 

Último

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Combining PacBio with short read technology for improved de novo genome assembly

  • 1. The best of both worlds Combining PacBio with short read technology for improved de novo genome assembly Lex Nederbragt, NSC and CEES lex.nederbragt@bio.uio.no
  • 3. Why does everybody want longer reads? … for genome assemblies
  • 4. What is a genome assembly Hierarchical structure reads contigs scaffolds
  • 5. Sequence data Reads reads contigs scaffolds original DNA fragments original DNA fragments Sequenced ends http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 6. Contigs Building contigs reads contigs scaffolds ACGCGATTCAGGTTACCACG GCGATTCAGGTTACCACGCG GATTCAGGTTACCACGCGTA TTCAGGTTACCACGCGTAGC CAGGTTACCACGCGTAGCGC Aligned reads GGTTACCACGCGTAGCGCAT TTACCACGCGTAGCGCATTA ACCACGCGTAGCGCATTACA CACGCGTAGCGCATTACACA CGCGTAGCGCATTACACAGA CGTAGCGCATTACACAGATT TAGCGCATTACACAGATTAG Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG
  • 7. Contigs Building contigs reads contigs scaffolds Repeat copy 1 Repeat copy 2 Contig orientation? Contig order? Collapsed repeat consensus http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 8. Mate pairs Other read type reads contigs scaffolds Repeat copy 1 Repeat copy 2 (much) longer fragments mate pair reads
  • 9. Scaffolds Ordered, oriented contigs reads contigs scaffolds mate pairs contigs gap size estimate
  • 10. What is a genome assembly Hierarchical structure reads ACGCGATTCAGGTTACCACG GCGATTCAGGTTACCACGCG GATTCAGGTTACCACGCGTA TTCAGGTTACCACGCGTAGC CAGGTTACCACGCGTAGCGC Aligned reads GGTTACCACGCGTAGCGCAT TTACCACGCGTAGCGCATTA ACCACGCGTAGCGCATTACA CACGCGTAGCGCATTACACA CGCGTAGCGCATTACACAGA contigs CGTAGCGCATTACACAGATT TAGCGCATTACACAGATTAG Consensus contig ACGCGATTCAGGTTACCACGCGTAGCGCATTACACAGATTAG scaffolds
  • 11. Genome assembly So, what’s so hard about it?
  • 12. 1) Repeats reads contigs scaffolds Repeat copy 1 Repeat copy 2 Repeats break up contigs Collapsed repeat consensus http://www.cbcb.umd.edu/research/assembly_primer.shtml
  • 13. 2) Heterozygosity Differences between sister * chromosomes * * http://commons.wikimedia.org/wiki/File:Chromosome_1.svg
  • 14. 2) Heterozygosity Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 16. 3) Many programs to choose from Zhang et al. (2011) doi:10.1371/journal.pone.0017915.g001
  • 17. Assembly: challenges Repeat copy 1 Repeat copy 2 Knowing how to use the programs Heterozygosity Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 18. So, why does everybody want longer reads? http://www.autobizz.com.my/forum/forum/General-Chat/944-The-worlds-longest-car.html
  • 19. Longer reads? Repeat copy 1 Repeat copy 2 Long reads can span repeats and heterozygous regions Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 20. PacBio to the rescue?
  • 21. High-throughput sequencing Library preparation SMRTBell'template' SMRTBell'template' Standard'Sequencing' Standard'Sequencing' Generates& pass& each& one& on& molecule& Insert& Large& Sizes& Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert sequenced& Single pass sequenced& Circular'Consensus'Sequencing' Circular'Consensus'Sequencing' Continued generations of reads Small Insert Sizes& Small& Small& Insert& Sizes Insert& Sizes& Multiple mul8ple& passes passes& each& Generates& on& molecule& Generates& mul8ple& sequenced& passes& each& on& molecule& sequenced&
  • 22. High-throughput sequencing Raw read length
  • 23. High-throughput sequencing SMRTBell'template' Raw reads and subreads Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert Single pass sequenced& ‘Subreads’ Circular'Consensus'Sequencing' Small Insert Sizes& Small&Insert& Sizes Multiple mul8ple& passes passes& each& Generates& on& molecule& sequenced&
  • 24. PacBio: uses SMRTBell'template' Long reads  low quality Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert Single pass sequenced& 85-87% accuracy Circular'Consensus'Sequencing' Useful for assembly? Small& Insert& Sizes& Generates& mul8ple& passes& each& on& molecule& sequenced&
  • 26. Solutions for assembly (1) Designed by Pacific Biosciences http://www.clker.com/clipart-4245.html
  • 27. Solutions for assembly (2) Broad Institute Need a special recipe for sequencing
  • 28. Solutions for assembly (3) PacBioToCA Error correct with short reads Celera assembler http://schatzlab.cshl.edu/presentations/2012-01-17.PAG.SMRTassembly.pdf
  • 29. PacBioToCA Koren et al, 2012
  • 31. Shameless self-promotion @lexnederbragt
  • 32. The Atlantic cod genome project
  • 33. First draft Fragmented assembly - short contigs - many gap bases http://en.wikipedia.org
  • 34. First draft 6467 scaffolds 35% gap bases
  • 35. The causes Short Tandem Repeats (>20% of gaps)
  • 36. The causes Heterozygosity? Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3
  • 37. The goal 23 pseudochromosomes Longer contigs Below 5% gap bases PacBio to the rescue?
  • 38. The approach SMRTBell'template' Libraries Standard'Sequencing' Generates& pass& ea one& on& Large Insert& Sizes Large& Sizes& Insert sequenced& Aim for looooong insert sizes Circular'Consensus'Sequencing' Small& Insert& Sizes& Generates& mul8ple& passes sequenced&
  • 39. SMRTBell'template' The approach Sequencing Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert Single pass sequenced& Sequence with 90 minute movies Circular'Consensus'Sequencing' Small& Insert& Sizes& Generates& mul8ple& passes& each& on& molecule& 10 x coverage in reads of at least 3000 bp sequenced& No, we don’t throw this away…
  • 41. PacBio results 100 Relative throughput at different minimum length cutoffs 10kb lib 2 Fraction of bases at minimum 10kb lib 1 length 4kb lib 80 Percentage of total sequence 60 40 20 0 0kbp 3kbp 5kbp 10kbp 15kbp Length cutoff longest subread Large library insert size important!
  • 42. chnology PacBio results SMRTBell'template' 64 SMRT Cells 3.2 Gigabytes in raw reads at least 3kb 3.8 x coverage 3 Standard'Sequencing' Generates& pass& each& one& on& molecule& Large Insert& Sizes Large& Sizes& Insert sequenced& 2.2 Gigabytes in longest subreads reads Circular'Consensus'Sequencing' Largest 15 kbp Small& Insert& Sizes& Generates& mul8ple& passes& each& on& molecule&
  • 43. PacBio results Mapping to the cod genome 11.4 kbp subread 10.6 kbp subread 10.9 kbp subread
  • 44. Example 1 ACACAC repeat 232 bp Gap TGTGTG repeat
  • 47. Example 1 Scaffold ...ACACAC TGTGTG... PacBio reads Unplaced contig
  • 50. Example 2 Scaffold ...TGTGTG PacBio reads Heterozygosity?
  • 51. Example 3 Scaffold PacBio reads 300 bp misassembly?
  • 52. Error-correction Work In Progress http://openclipart.org/
  • 53. Outlook Will PacBio solve our problems?
  • 55. Outlook Polymorphic contig 2 Contig 1 Contig 4 Polymorphic contig 3 Will we find the heterozygous regions?