SlideShare a Scribd company logo
1 of 38
Taxon diversity analysis for bulk insect samples using
               Illumina Hi-seq platform

                           Xin ZHOU, Shanlin LIU, Yiyuan LI,
                                    Qing YANG, and Xu SU

                                Department of Science and Technology
                              Environmental Genomics Research Group
                                                           BGI, China




                                     Adelaide, Australia, 3 December 2011
Problem                       Solutions?




          Opt.1: ......zzzzZZZZZ

          Opt.2: morph sorting  indiv. ID  …  Opt.1

          Opt.3: morph sorting  indiv. barcoding  …  Opt.1

          Opt.4: grinding up  NGS  CLUSTERING/BLAST
                            DIVERSITY!




                              Zhou et al. 2011, 4th International Barcode of Life Conference
Environmental barcoding of bulk insects

                                                                    aquatic insects
                                                                    mini-barcode (130bp)
                                                                    454




                                                                   bat diet (insects)
                                                                   COI fragment, 157 bp
                                                                   454




Biodiversity soup: metabarcoding of arthropods                     Malaise trap (insects)
for rapid biodiversity assessment and                              COI fragment, ~400 bp
biomonitoring, Yu D.W. et.al., in review                           454



                                                 Zhou et al. 2011, 4th International Barcode of Life Conference
Major NGS platforms applicable in environmental barcoding

                                                                            Requirement
                           Read      Data/run
    NGS platforms                                    Run time                of library
                          length       (GB)
                                                                            construction
454 platform
                      ~400bp               0.7          23 hr.                        Yes
(GS FLX Titanium XL+)
Illumina platform       150bp
                                          600            14 d.                        Yes
(Hi-Seq 2000)           PE reads
Illumina platform       150bp
                                              2         27 hr.                        Yes
(Mi-Seq)                PE reads

Ion Torrent             200bp               ~1          3.5 hr.                        No


  Illumina Hi-Seq    higher through-put
                     less $ / bp
                     increasing reading length
                     variety of bioinformatics tools available from genomic
                      pipelines
                                            Zhou et al. 2011, 4th International Barcode of Life Conference
Sequencing capacity at BGI




•    28   Illumina GAIIx        Data production:
•   137   Illumina Hi-Seq2000   •   100 Gb / day (2009)
•    25   Life Tech SOLiD 4     •   >5 Tb / day (end of 2010)
•    16   ABI 3730XL            •   >1500X human genome / day
•   110   MegaBACEs
•     2   Illumina iScan
•     1   Roche 454
•     1   Ion Torrent
•     1   Illumina Mi-Seq

                                Zhou et al. 2011, 4th International Barcode of Life Conference
What I am NOT going to talk about:
  • Primer optimization
  • Systematic comparisons of NGS platforms
  • Quantitative diversity analysis


What I AM going to talk about:
  • Can Illumina NGS be used in diversity analysis?




                            Zhou et al. 2011, 4th International Barcode of Life Conference
Can Illumina NGS be used in diversity analysis?



             Sequencing error rate
             Read-length




                            Zhou et al. 2011, 4th International Barcode of Life Conference
Sequencing error rate


 No indel issue in homopolymers
 Sequencing quality keeps increasing
 Rare nucleotide error can be easily
  corrected by:
                                                Recent improvement in sequencing quality
     increasing sequencing depth                     using Illumina’s V3 chemical
                                               (even at 100 bp, only about 10% of the base callings has error
     pair-end (PE) sequencing                                           rate >1%)


     setting stringent matching criteria in
                                                                     150bp
       the overlapping fragment by allowing
       only >99% identity                                                            150bp

                                                                           Insert-size
                                                                             250nt

                                                           PE sequencing enables forming
                                                                 sequence contigs


                                                   Zhou et al. 2011, 4th International Barcode of Life Conference
Read length
                                                               150bp



                                                                               150bp

                                                                     Insert-size
                                                                       250nt
 Read length keeps increasing
                                                    150PE enables contig read of 250bp
 Short-gun reads can be further assembled
  into longer fragments (“short-gun” assembly
  strategy used in genome sequencing
  projects)




                                                         Option of scaffold assembly




                                                Zhou et al. 2011, 4th International Barcode of Life Conference
Illumina environmental barcoding
                                          Illumina
                                        e-barcoding


                       PCR based                                     PCR free
Lib1 (658bp, 150PE)                     Lib2 (200bp, 150PE)


     Full length COI             COI amplicons
      barcode PE                  shotgun PE                     Mitochondrial
      sequencing                  sequencing                     shotgun PE
                                                                 sequencing




                      Full length COI                          Full length COI
                                                              without PCR bias

                                                       Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based
                                Sample information
                                                    XSBN
                       Mock
                                            (provided by Yu et al.)
  # Specimens            23                                     292
# Haplotypes (2%)        12                                     230
 Soup protocol      DNA extracted individually and mixed for PCR
  PCR primers       LepF1/LepR1                          Customized
Sequence length        658 bp                                700 bp
   Sequencing
                    Full length (658bp) + Short-gun library (~200bp)
 library details
   Sequencing
                                        150PE
    protocol



                                        Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based
              Pre-analysis data filtering

      Lib 1           Mock                       XSBN
     Raw data         1.67G                      4.04G
 Filtering adapter    1.60G                      1.28G
High quality (Q20)    0.35G                      0.50G
      # Reads
                     1,081,997               1,150,477
(Primer removed)
  # Unique reads
                      36,618                    45,444
 (Abundance > 1)




                           Zhou et al. 2011, 4th International Barcode of Life Conference
OTU filtering workflow

       Unique     OTU       Alignment         Remove                     Compared
       reads      cluster                     Chimera                    to reads
       (abunda    (98%)                                                  of Lib 2
       nce > 1)

Mock    36,618      784        490                  119                          44

XSBN    45,444     4,189      3887                  403                         399




                                     Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference
                   Blast at 100% identity
                                                                 Results
NGS OTUs




        Mock       4        8              36                        LepF1/R1




                                                                  Customized
        XSBN       32     198              197                    primers




                                Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference
                                          Mock
        NGS OTUs



                                          “False positive”?
                                                                           31 can be found in
                     False negative                                        our total sample,
                                                                           from which our
                                                                           mock samples
Not found in raw                                                           were assembled
data (likely due
to primer failure)
                              4       8        36


                                                                           5 likely to be PCR
                                                                           errors




                                                     Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference
                              XSBN                             Cross-sample
      NGS OTUs                                                 contamination?




17 not found in raw
data (primer failure)




                                                           Mean + SE
                         32    198       197
                              (group1)   (group2)
15 were lost in
data filtering




                                          Zhou et al. 2011, 4th International Barcode of Life Conference
Sanger Reference

 NGS OTUs

                                                       Significantly less false
                                                       positives

                    after removal of sequences
                    with abundance <10




32    198       197                           49               181                     84




                                                        Slight drop of true
                                                        positives

                                        Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #1: PCR-based
                                     What’s next?
                    Illumina
                  e-barcoding




 Obtaining full-length barcodes via short-gun reads assembly
   (new program in development – “SOAPbarcode”)
 New algorithm to filter out false positive OTUs
                                           Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #2: PCR-free method

                   Total MT isolation
Individual
                           &
barcoding
                    DNA extraction



                  Shotgun sequencing




     Reference                         Reference
   based method                   independent method




                          Zhou et al. 2011, 4th International Barcode of Life Conference
Building reference library: individual barcoding


1. 89 individuals;
2. 84 reference barcodes;
3. 39 OTUs (2%);



     Taxon group # OTUs
     Lepidoptera   25
       Diptera     7
      Hemiptera    4
     Hymenoptera   2
      Psocoptera   1
         Total     39



                             Zhou et al. 2011, 4th International Barcode of Life Conference
Total MT isolation
                 & DNA extraction



Sample    Total MT                    MT DNA
mixture   isolation                   extraction




                      Zhou et al. 2011, 4th International Barcode of Life Conference
Shotgun sequencing
Insert size: 200bp;
Read length: 100bp PE;




                                          Percentage of
                                           base pairs
                 Q20
                                                  96.2%
      (Sequencing error rate < 1%)
                 Q30
                                                  92.9%
     (Sequencing error rate < 0.1%)
              GC content                          38.0%



                                      Zhou et al. 2011, 4th International Barcode of Life Conference
Pre-analysis

Data filtering:
1. Adaptor contamination removal;
2. Quality control:
     in each read, only allowing <10bp with seq. error rate >1%



              Raw data                   2.45G
            After filtering              2.20G
            Ratio of high
                                       89.91%
           quality reads



                                    Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #2: PCR-free method
                 Method 1: Reference based
Blast reads to reference barcodes,
confident identification is made only when:
1. Best BLAST hit >98% identity;
2. Reference coverage > 90%;


        Taxon groups # OTUs
Reference 1                                                  Coverage: 100%
          Lepidoptera  20
Correct     Diptera     2
mapping Hemiptera       3
          Psocoptera    1
             Total     26
Reference 2Not found   13                                    Coverage: 30%
Incorrect
mapping
                                   Zhou et al. 2011, 4th International Barcode of Life Conference
Potential sources of failure in detecting taxa




          Taxon specific
                or
             Bio-mass
         (size & number)



                          Zhou et al. 2011, 4th International Barcode of Life Conference
Failures in taxon detection

Taxon bias?

Taxon groups   # Total   # OTUs
 undetected    OTUs      missing
 Lepidoptera     25         5
   Diptera        7         5
Hymenoptera       2         2
  Hemiptera       4         1
 Psocoptera       1         0
    Total        39        13




                                   Zhou et al. 2011, 4th International Barcode of Life Conference
Failures in taxon detection

OR bio-mass (body size, # individuals)?

         Readily detected              Missing
         Average length> 5mm   Average length < 5mm




                                  Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #2: PCR-free method
     Method 2: Reference independent
      (Will we be able to identify diversity without reference MT genomes
                                                  for the targeted species?)

Workflow:
1. Assembly of COI gene using genome
   assembly program (SOAPdenovo);

2. Annotation using ~240 MT genomes
   downloaded from Genbank;



                                          Zhou et al. 2011, 4th International Barcode of Life Conference
PCR-Free reference-independent: results


       23/31 falling in standard COI barcode
       region (mostly >600 bp);


       1 of 23 is not in our reference barcodes;
       (Insecta; Lepidoptera; Pyralidae);


       Multiple genes obtained simultaneously;
       1 nearly complete mitochondrial genome (~15k bp);
       3 fragments >6000 bp;


                                 Zhou et al. 2011, 4th International Barcode of Life Conference
Reference independent

               23/31 falling in standard COI barcode
               1 of 23 was not presented in our reference barcodes;
               region (mostly >600 bp);
               (Insecta; Lepidoptera; Pyralidae);


                                           Number of individuals we collected
                            5 individuals failed in Sanger sequencing
                                            89 individuals

3 OTUs not detected in reference
                                              Barcode references
   independent method because:                39 OTUs (84 individuals)

                                              References based
(1) sequencing depth is too low               26 OTUs

     (<10X) to allow for reliable             References independent
                                              23 OTUs
     assembly
(2) relatively small body-size

                                           Zhou et al. 2011, 4th International Barcode of Life Conference
PCR-free method

   Multiple MT genes obtained simultaneously


Gene    Number
ATP6      29
ATP8      4
COX1      31
COX2      33
COX3      31
CYTB      31
ND1       35
ND2       34
ND3       24
ND4       30
ND4L      16
ND5       30
ND6       24             Zhou et al. 2011, 4th International Barcode of Life Conference
PCR-free method

            1 nearly complete mitochondrial genome (~15k bp);
            3 fragments longer than 6k bp;




Barcode region
                                       Zhou et al. 2011, 4th International Barcode of Life Conference
Approach #2: PCR-free method
                             What’s next?
Currently:
     MT DNA 5-10% after isolation;
     Non-targeting DNA affects MT assembly (e.g.,
      bacteria & genomic DNA);
     Taxonomic/biomass bias

Potential solutions:
    1. Wet-lab protocol optimization
        Pre-sorting insects by body-size
        Alternative MT isolation methods

    2. Increase sequencing depth


                               Zhou et al. 2011, 4th International Barcode of Life Conference
Conclusions
 Illumina Hi-Seq delivers compatible performance
  as other NGS platforms in analyzing bulk insect
  samples, with potential advantages in achieving
  higher sensitivity at lower cost;
 Deep sequencing capacity enables a novel PCR-
  free approach, which may eventually solve biases
  caused by DNA amplification;
 It shares issues with other NGS platforms (non-
  quantitative, inflation of OTUs, etc.)
 Methodology optimization is much needed in
  many details of the pipeline;
 Collaborative and synergistic efforts made by the
  community would greatly advance the progress.


                             Zhou et al. 2011, 4th International Barcode of Life Conference
Acknowledgements

Funder:


Collaborators:   Douglas W. Yu
                 Kunming Institute of Zoology, Chinese Academy of Sciences

                 Mehrdad Hajibabaei, Shadi Shokralla
                 University of Guelph

                 Owain Edwards
                 CSIRO Ecosystem Sciences




                                                                           LU Jianliang
                                                                           WU Qiong
                                                                           AN Sainan
                                                                           ZHOU Yizhuang
                                                                           ZHAO Jing


                                         Zhou et al. 2011, 4th International Barcode of Life Conference
Thanks for your
  attention!



                                                          36
         Zhou et al. 2011, 4th International Barcode of Life Conference
Xin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing Plenary

More Related Content

What's hot

IFPAC 2013 Baltimore
IFPAC 2013 BaltimoreIFPAC 2013 Baltimore
IFPAC 2013 Baltimore
dominev
 
Frost Sullivan3
Frost Sullivan3Frost Sullivan3
Frost Sullivan3
BioTrove
 
Triton-Plus-Nd-AN30280
Triton-Plus-Nd-AN30280Triton-Plus-Nd-AN30280
Triton-Plus-Nd-AN30280
Anne Trinquier
 
An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2
An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2
An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2
bartdegusseme
 

What's hot (13)

Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
Bruce Deagle - Opening Plenary
Bruce Deagle - Opening PlenaryBruce Deagle - Opening Plenary
Bruce Deagle - Opening Plenary
 
BioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics LibraryBioRuby -- Bioinformatics Library
BioRuby -- Bioinformatics Library
 
Pierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing PlenaryPierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing Plenary
 
TaqMan® Gene Expression Products
TaqMan® Gene Expression ProductsTaqMan® Gene Expression Products
TaqMan® Gene Expression Products
 
IFPAC 2013 Baltimore
IFPAC 2013 BaltimoreIFPAC 2013 Baltimore
IFPAC 2013 Baltimore
 
Frost Sullivan3
Frost Sullivan3Frost Sullivan3
Frost Sullivan3
 
xGen® Lockdown® Probes
xGen® Lockdown® ProbesxGen® Lockdown® Probes
xGen® Lockdown® Probes
 
Precision Lenti Orf Technical Manual
Precision Lenti Orf Technical ManualPrecision Lenti Orf Technical Manual
Precision Lenti Orf Technical Manual
 
Sequencing
SequencingSequencing
Sequencing
 
Triton-Plus-Nd-AN30280
Triton-Plus-Nd-AN30280Triton-Plus-Nd-AN30280
Triton-Plus-Nd-AN30280
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2
An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2
An Add On Nitrifying MBR As Effluent Polishing Technique For Ee2
 

Similar to Xin Zhou - Saturday Closing Plenary

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
mkim8
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
jukais
 
Nextgentechnologies 124159213386-phpapp01
Nextgentechnologies 124159213386-phpapp01Nextgentechnologies 124159213386-phpapp01
Nextgentechnologies 124159213386-phpapp01
Nicolas Gobet
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
Hong ChangBum
 
6.남영도110923
6.남영도1109236.남영도110923
6.남영도110923
drugmetabol
 

Similar to Xin Zhou - Saturday Closing Plenary (20)

A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
NGx Sequencing 101-platforms
NGx Sequencing 101-platformsNGx Sequencing 101-platforms
NGx Sequencing 101-platforms
 
Nextgentechnologies 124159213386-phpapp01
Nextgentechnologies 124159213386-phpapp01Nextgentechnologies 124159213386-phpapp01
Nextgentechnologies 124159213386-phpapp01
 
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
New High Throughput Sequencing technologies at the Norwegian Sequencing Centr...
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Next generation-sequencing.ppt-converted
Next generation-sequencing.ppt-convertedNext generation-sequencing.ppt-converted
Next generation-sequencing.ppt-converted
 
Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
Next Gen Sequencing Technologies Overview
Next Gen Sequencing Technologies OverviewNext Gen Sequencing Technologies Overview
Next Gen Sequencing Technologies Overview
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
whole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdfwhole-genome-sequencing-guide-small-genomes.pdf.pdf
whole-genome-sequencing-guide-small-genomes.pdf.pdf
 
Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.Evaluation of the impact of error correction algorithms on SNP calling.
Evaluation of the impact of error correction algorithms on SNP calling.
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
ASM Microbe 2017: Reaching the Parts Other Methods Can't: Long Reads for Micr...
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
DNA Barcoding
DNA BarcodingDNA Barcoding
DNA Barcoding
 
6.남영도110923
6.남영도1109236.남영도110923
6.남영도110923
 

More from Consortium for the Barcode of Life (CBOL)

More from Consortium for the Barcode of Life (CBOL) (20)

Andrew Lowe - Opening Plenary
Andrew Lowe - Opening PlenaryAndrew Lowe - Opening Plenary
Andrew Lowe - Opening Plenary
 
Axel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates PlenaryAxel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates Plenary
 
Hannah McPherson - Plants Plenary
Hannah McPherson - Plants PlenaryHannah McPherson - Plants Plenary
Hannah McPherson - Plants Plenary
 
Rebecca Johnson - Opening Plenary
Rebecca Johnson - Opening PlenaryRebecca Johnson - Opening Plenary
Rebecca Johnson - Opening Plenary
 
K.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi PlenaryK.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi Plenary
 
Scott Miller - Opening Plenary
Scott Miller - Opening PlenaryScott Miller - Opening Plenary
Scott Miller - Opening Plenary
 
Ralph Imondi - Opening Plenary
Ralph Imondi - Opening PlenaryRalph Imondi - Opening Plenary
Ralph Imondi - Opening Plenary
 
Damon Little - Opening Plenary
Damon Little - Opening PlenaryDamon Little - Opening Plenary
Damon Little - Opening Plenary
 
Natasha de Vere - Plants Plenary
Natasha de Vere - Plants PlenaryNatasha de Vere - Plants Plenary
Natasha de Vere - Plants Plenary
 
Robert Hanner - Closing Plenary
Robert Hanner - Closing PlenaryRobert Hanner - Closing Plenary
Robert Hanner - Closing Plenary
 
Paul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing PlenaryPaul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing Plenary
 
Conrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing PlenaryConrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing Plenary
 
Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative
 
Weiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi PlenaryWeiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi Plenary
 
Alain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi PlenaryAlain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi Plenary
 
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi PlenaryMarieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
 
John La Salle - Opening Plenary
John La Salle - Opening PlenaryJohn La Salle - Opening Plenary
John La Salle - Opening Plenary
 
Todd Osmundson - Algae, Protists & Fungi Plenary
Todd Osmundson - Algae, Protists & Fungi PlenaryTodd Osmundson - Algae, Protists & Fungi Plenary
Todd Osmundson - Algae, Protists & Fungi Plenary
 
Ilene Mizrachi - Opening Plenary
Ilene Mizrachi - Opening PlenaryIlene Mizrachi - Opening Plenary
Ilene Mizrachi - Opening Plenary
 
Gary Saunders - Algae, Protists & Fungi Plenary
Gary Saunders - Algae, Protists & Fungi PlenaryGary Saunders - Algae, Protists & Fungi Plenary
Gary Saunders - Algae, Protists & Fungi Plenary
 

Recently uploaded

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Recently uploaded (20)

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Xin Zhou - Saturday Closing Plenary

  • 1. Taxon diversity analysis for bulk insect samples using Illumina Hi-seq platform Xin ZHOU, Shanlin LIU, Yiyuan LI, Qing YANG, and Xu SU Department of Science and Technology Environmental Genomics Research Group BGI, China Adelaide, Australia, 3 December 2011
  • 2. Problem Solutions? Opt.1: ......zzzzZZZZZ Opt.2: morph sorting  indiv. ID  …  Opt.1 Opt.3: morph sorting  indiv. barcoding  …  Opt.1 Opt.4: grinding up  NGS  CLUSTERING/BLAST  DIVERSITY! Zhou et al. 2011, 4th International Barcode of Life Conference
  • 3. Environmental barcoding of bulk insects  aquatic insects  mini-barcode (130bp)  454  bat diet (insects)  COI fragment, 157 bp  454 Biodiversity soup: metabarcoding of arthropods  Malaise trap (insects) for rapid biodiversity assessment and  COI fragment, ~400 bp biomonitoring, Yu D.W. et.al., in review  454 Zhou et al. 2011, 4th International Barcode of Life Conference
  • 4. Major NGS platforms applicable in environmental barcoding Requirement Read Data/run NGS platforms Run time of library length (GB) construction 454 platform ~400bp 0.7 23 hr. Yes (GS FLX Titanium XL+) Illumina platform 150bp 600 14 d. Yes (Hi-Seq 2000) PE reads Illumina platform 150bp 2 27 hr. Yes (Mi-Seq) PE reads Ion Torrent 200bp ~1 3.5 hr. No Illumina Hi-Seq  higher through-put  less $ / bp  increasing reading length  variety of bioinformatics tools available from genomic pipelines Zhou et al. 2011, 4th International Barcode of Life Conference
  • 5. Sequencing capacity at BGI • 28 Illumina GAIIx Data production: • 137 Illumina Hi-Seq2000 • 100 Gb / day (2009) • 25 Life Tech SOLiD 4 • >5 Tb / day (end of 2010) • 16 ABI 3730XL • >1500X human genome / day • 110 MegaBACEs • 2 Illumina iScan • 1 Roche 454 • 1 Ion Torrent • 1 Illumina Mi-Seq Zhou et al. 2011, 4th International Barcode of Life Conference
  • 6. What I am NOT going to talk about: • Primer optimization • Systematic comparisons of NGS platforms • Quantitative diversity analysis What I AM going to talk about: • Can Illumina NGS be used in diversity analysis? Zhou et al. 2011, 4th International Barcode of Life Conference
  • 7. Can Illumina NGS be used in diversity analysis?  Sequencing error rate  Read-length Zhou et al. 2011, 4th International Barcode of Life Conference
  • 8. Sequencing error rate  No indel issue in homopolymers  Sequencing quality keeps increasing  Rare nucleotide error can be easily corrected by: Recent improvement in sequencing quality  increasing sequencing depth using Illumina’s V3 chemical (even at 100 bp, only about 10% of the base callings has error  pair-end (PE) sequencing rate >1%)  setting stringent matching criteria in 150bp the overlapping fragment by allowing only >99% identity 150bp Insert-size 250nt PE sequencing enables forming sequence contigs Zhou et al. 2011, 4th International Barcode of Life Conference
  • 9. Read length 150bp 150bp Insert-size 250nt  Read length keeps increasing 150PE enables contig read of 250bp  Short-gun reads can be further assembled into longer fragments (“short-gun” assembly strategy used in genome sequencing projects) Option of scaffold assembly Zhou et al. 2011, 4th International Barcode of Life Conference
  • 10. Illumina environmental barcoding Illumina e-barcoding PCR based PCR free Lib1 (658bp, 150PE) Lib2 (200bp, 150PE) Full length COI COI amplicons barcode PE shotgun PE Mitochondrial sequencing sequencing shotgun PE sequencing Full length COI Full length COI without PCR bias Zhou et al. 2011, 4th International Barcode of Life Conference
  • 11. Approach #1: PCR-based Sample information XSBN Mock (provided by Yu et al.) # Specimens 23 292 # Haplotypes (2%) 12 230 Soup protocol DNA extracted individually and mixed for PCR PCR primers LepF1/LepR1 Customized Sequence length 658 bp 700 bp Sequencing Full length (658bp) + Short-gun library (~200bp) library details Sequencing 150PE protocol Zhou et al. 2011, 4th International Barcode of Life Conference
  • 12. Approach #1: PCR-based Pre-analysis data filtering Lib 1 Mock XSBN Raw data 1.67G 4.04G Filtering adapter 1.60G 1.28G High quality (Q20) 0.35G 0.50G # Reads 1,081,997 1,150,477 (Primer removed) # Unique reads 36,618 45,444 (Abundance > 1) Zhou et al. 2011, 4th International Barcode of Life Conference
  • 13. OTU filtering workflow Unique OTU Alignment Remove Compared reads cluster Chimera to reads (abunda (98%) of Lib 2 nce > 1) Mock 36,618 784 490 119 44 XSBN 45,444 4,189 3887 403 399 Zhou et al. 2011, 4th International Barcode of Life Conference
  • 14. Sanger Reference Blast at 100% identity Results NGS OTUs Mock 4 8 36 LepF1/R1 Customized XSBN 32 198 197 primers Zhou et al. 2011, 4th International Barcode of Life Conference
  • 15. Sanger Reference Mock NGS OTUs “False positive”? 31 can be found in False negative our total sample, from which our mock samples Not found in raw were assembled data (likely due to primer failure) 4 8 36 5 likely to be PCR errors Zhou et al. 2011, 4th International Barcode of Life Conference
  • 16. Sanger Reference XSBN Cross-sample NGS OTUs contamination? 17 not found in raw data (primer failure) Mean + SE 32 198 197 (group1) (group2) 15 were lost in data filtering Zhou et al. 2011, 4th International Barcode of Life Conference
  • 17. Sanger Reference NGS OTUs Significantly less false positives after removal of sequences with abundance <10 32 198 197 49 181 84 Slight drop of true positives Zhou et al. 2011, 4th International Barcode of Life Conference
  • 18. Approach #1: PCR-based What’s next? Illumina e-barcoding  Obtaining full-length barcodes via short-gun reads assembly (new program in development – “SOAPbarcode”)  New algorithm to filter out false positive OTUs Zhou et al. 2011, 4th International Barcode of Life Conference
  • 19. Approach #2: PCR-free method Total MT isolation Individual & barcoding DNA extraction Shotgun sequencing Reference Reference based method independent method Zhou et al. 2011, 4th International Barcode of Life Conference
  • 20. Building reference library: individual barcoding 1. 89 individuals; 2. 84 reference barcodes; 3. 39 OTUs (2%); Taxon group # OTUs Lepidoptera 25 Diptera 7 Hemiptera 4 Hymenoptera 2 Psocoptera 1 Total 39 Zhou et al. 2011, 4th International Barcode of Life Conference
  • 21. Total MT isolation & DNA extraction Sample Total MT MT DNA mixture isolation extraction Zhou et al. 2011, 4th International Barcode of Life Conference
  • 22. Shotgun sequencing Insert size: 200bp; Read length: 100bp PE; Percentage of base pairs Q20 96.2% (Sequencing error rate < 1%) Q30 92.9% (Sequencing error rate < 0.1%) GC content 38.0% Zhou et al. 2011, 4th International Barcode of Life Conference
  • 23. Pre-analysis Data filtering: 1. Adaptor contamination removal; 2. Quality control: in each read, only allowing <10bp with seq. error rate >1% Raw data 2.45G After filtering 2.20G Ratio of high 89.91% quality reads Zhou et al. 2011, 4th International Barcode of Life Conference
  • 24. Approach #2: PCR-free method Method 1: Reference based Blast reads to reference barcodes, confident identification is made only when: 1. Best BLAST hit >98% identity; 2. Reference coverage > 90%; Taxon groups # OTUs Reference 1 Coverage: 100% Lepidoptera 20 Correct Diptera 2 mapping Hemiptera 3 Psocoptera 1 Total 26 Reference 2Not found 13 Coverage: 30% Incorrect mapping Zhou et al. 2011, 4th International Barcode of Life Conference
  • 25. Potential sources of failure in detecting taxa Taxon specific or Bio-mass (size & number) Zhou et al. 2011, 4th International Barcode of Life Conference
  • 26. Failures in taxon detection Taxon bias? Taxon groups # Total # OTUs undetected OTUs missing Lepidoptera 25 5 Diptera 7 5 Hymenoptera 2 2 Hemiptera 4 1 Psocoptera 1 0 Total 39 13 Zhou et al. 2011, 4th International Barcode of Life Conference
  • 27. Failures in taxon detection OR bio-mass (body size, # individuals)? Readily detected Missing Average length> 5mm Average length < 5mm Zhou et al. 2011, 4th International Barcode of Life Conference
  • 28. Approach #2: PCR-free method Method 2: Reference independent (Will we be able to identify diversity without reference MT genomes for the targeted species?) Workflow: 1. Assembly of COI gene using genome assembly program (SOAPdenovo); 2. Annotation using ~240 MT genomes downloaded from Genbank; Zhou et al. 2011, 4th International Barcode of Life Conference
  • 29. PCR-Free reference-independent: results 23/31 falling in standard COI barcode region (mostly >600 bp); 1 of 23 is not in our reference barcodes; (Insecta; Lepidoptera; Pyralidae); Multiple genes obtained simultaneously; 1 nearly complete mitochondrial genome (~15k bp); 3 fragments >6000 bp; Zhou et al. 2011, 4th International Barcode of Life Conference
  • 30. Reference independent 23/31 falling in standard COI barcode 1 of 23 was not presented in our reference barcodes; region (mostly >600 bp); (Insecta; Lepidoptera; Pyralidae); Number of individuals we collected 5 individuals failed in Sanger sequencing 89 individuals 3 OTUs not detected in reference Barcode references independent method because: 39 OTUs (84 individuals) References based (1) sequencing depth is too low 26 OTUs (<10X) to allow for reliable References independent 23 OTUs assembly (2) relatively small body-size Zhou et al. 2011, 4th International Barcode of Life Conference
  • 31. PCR-free method Multiple MT genes obtained simultaneously Gene Number ATP6 29 ATP8 4 COX1 31 COX2 33 COX3 31 CYTB 31 ND1 35 ND2 34 ND3 24 ND4 30 ND4L 16 ND5 30 ND6 24 Zhou et al. 2011, 4th International Barcode of Life Conference
  • 32. PCR-free method 1 nearly complete mitochondrial genome (~15k bp); 3 fragments longer than 6k bp; Barcode region Zhou et al. 2011, 4th International Barcode of Life Conference
  • 33. Approach #2: PCR-free method What’s next? Currently:  MT DNA 5-10% after isolation;  Non-targeting DNA affects MT assembly (e.g., bacteria & genomic DNA);  Taxonomic/biomass bias Potential solutions: 1. Wet-lab protocol optimization  Pre-sorting insects by body-size  Alternative MT isolation methods 2. Increase sequencing depth Zhou et al. 2011, 4th International Barcode of Life Conference
  • 34. Conclusions  Illumina Hi-Seq delivers compatible performance as other NGS platforms in analyzing bulk insect samples, with potential advantages in achieving higher sensitivity at lower cost;  Deep sequencing capacity enables a novel PCR- free approach, which may eventually solve biases caused by DNA amplification;  It shares issues with other NGS platforms (non- quantitative, inflation of OTUs, etc.)  Methodology optimization is much needed in many details of the pipeline;  Collaborative and synergistic efforts made by the community would greatly advance the progress. Zhou et al. 2011, 4th International Barcode of Life Conference
  • 35. Acknowledgements Funder: Collaborators: Douglas W. Yu Kunming Institute of Zoology, Chinese Academy of Sciences Mehrdad Hajibabaei, Shadi Shokralla University of Guelph Owain Edwards CSIRO Ecosystem Sciences LU Jianliang WU Qiong AN Sainan ZHOU Yizhuang ZHAO Jing Zhou et al. 2011, 4th International Barcode of Life Conference
  • 36. Thanks for your attention! 36 Zhou et al. 2011, 4th International Barcode of Life Conference