SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
SSAHA_pileup:
A Genome Variation Detection Pipeline for
     Various Sequencing Platforms




             Photo Credit: saynine on flickr.com


          Ben Blackburne
     Wellcome Trust Sanger Institute
Acknowledgments


●Zemin Ning
●Yong Gu
●Antony Cox
●Adam Spargo
●Hannes Ponstingl
Introduction
●New sequencing technologies
  – More data
  – Different kinds of data
     ●Solexa, 454
     ●capillary, too
  – Diploid genomes
  – SNPs, indels, VNTRs




                              Photo Credit: mknowles on flickr.com
SSAHA_pileup
●Sequence Search and Alignment by Hashing
 Algorithm
●SSAHA_SNP
  – Global positioning with SSAHA algorithm
  – Fast Smith-Waterman implementation (from
    Cross_Match)
  – Identification of best match
●SSAHA_pileup
  – Determines SNPs from set of best alignments
●Works on Solexa, 454, and capillary reads
The Toolchain
Reference
 Genome




            SSAHA_snp/
                         Alignments      SSAHA_pileup
             SSAHA2




                                           variations
 Reads


                            refinement
SSAHA_SNP
●Reference genome is “hashed”
  – table made of all k-mer words
  – overlapping or not, at user's option
SSAHA_SNP
●k-mer matches found for query in reference


  chr n




  chr m
SSAHA_SNP


chr n

        Global Mapping


chr m
SSAHA_SNP


chr n
                           score: 126
        Local Mapping
        (Smith-Waterman)
                           score: 113
chr m
SSAHA_SNP


chr n
                            score: 126
        Select best match

                            score: 113
chr m
SSAHA_SNP
●Read pair information
  – currently possible with
    extra step using SSAHA2
  – being integrated into
    SSAHA_SNP
  – Removes incorrectly
    mapped pairs




                              Photo Credit: Matthew Fang on flickr.com
SSAHA_pileup
Reference
 Genome




            SSAHA_snp/
                         Alignments      SSAHA_pileup
             SSAHA2




                                           variations
 Reads


                            refinement
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
   GGTCCCACGGAGCTGGAG
        CCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
                     Aligned reads
 Homozygous SNP
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
    GGTCCCACAGAGCTGGAG
          CCACAGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
                       Aligned reads
 Heterozygous SNP
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
     GGTCCCACAGAGCTGGAG
           CCACAGAGCTGGAGAAAGCCT
        TCCCACggagCTGGAGAAAGCCT
        TCCCACggagcTGGAGAAAGCCT
        TCCCacggagcTGGAGAAAGCCT
                             Aligned reads
Heterozygous SNP??
                   (Probably not)
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
   GGTCCCAC-----TGGAG
        CCAC-----TGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
                       Aligned reads
    Heterozygous indel
How well does it work?
Datasets
●Venter: ABI capillary reads
  – Celera: 19,397,599     55% in pairs
  – JCVI: 12,541,352       98% in pairs
  – Total: 31,938,951    72% in pairs (90% mapped)
●Watson: 454 GS FLX reads
  – Baylor & Roche 74,198,831 (90.5% mapped)
  – single end reads with length 150 – 280 bps
●Chromosome X Illumina reads
  – 278,557,156 reads (71.6% mapped)
  – (paired with insert size 200bps)
How conservative should we
           be?
How conservative should we
           be?
Or....




How liberal should we be?
How do we even know if we are
         winning?
dbSNP
(but not ideal)
Filtering
●Processes that cause bogus SNPs
  – Incorrect global mapping
  – Incorrect local alignment
  – Poor quality reads
  – Sequence amplification errors
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                                        chr n




                `          `               `
                       `               `            `
                                   `
                ` ``       `   `               ``
                                           `
                                                        chr m
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                                        chr n




                `          `               `
                       `               `            `
                                   `
                ` ``       `   `               ``
                                           `
                                                        chr m
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                              chr n
              `
                               `
             `  ``
              `
              `
                  `
                `
                          `
                      `            `
                          ``
                  `
SNPs
Solution:
 Filter out SNPs called from
abnormally high read depths
Global Mapping Problems
●Incorrectly aligned reads


                                  chr n
               `     score: 132




               `     score: 136
                                  chr m
Solution:
                          nd
Filter out SNPs where 2 best
       score is too close
Local Alignment Problems
●Misalignment
  – Uncaught incorrect global alignment
  – Variations in short repeats
Local Misalignment
                      Reference
...GGTCCCACAGAGCTGGAGAAAA...
    GGTCCCACT---CTAGTG
        CCACT---CTAGTGAAAA
      TCCCACT---CTAGTGAAAA


                       Aligned reads
 Real SNPs?
Local Misalignment
                      Reference
..TAATAATAATAATAATAATAAGAAG..
    AATAATAAGAAGAAGAAGAAGAAG
    AATAATAAGAAGAAGAAGAAGAAG
    AATAATAAGAAGAAGAAGAAGAAG


                       Aligned reads
 Real SNPs?
Solution:
Filter out short blocks of many
             SNPs
Venter SNP Calling (Capillary)

                 count     fraction in dbSNP

Homozygous SNPs 1 347 806 97.1%

Heterozygous SNPs 1 857 167 90.9%

Total SNPs       3 204 973 93.5%
Watson SNP Calling (454)

                  count    fraction in
                           dbSNP

Homozygous SNPs   1 298 309 93.0%

Heterozygous SNPs 1 767 951 63.9%

Total SNPs        3 066 260 76.3%
X Chromosome SNPs (Solexa)

                  count    fraction in dbSNP

Homozygous SNPs 27 708     92.8%

Heterozygous SNPs 63 197   81.8%

Total SNPs        90 905   85.1%
Venter-Watson Overlap



  1 593 791   1 611 182   1 455 078




   Venter                     Watson
X Chromosome Overlap

             Solexa X reads
                  40 625


         19 978            12 590

                  17 712


    26 502        6 588       22 872

    Venter                    Watson
Conclusions
●SSAHA_pileup is effective across both new and
 old sequencing technologies
●Questions
  – When is a SNP not a SNP?
  – Homozygous/Heterozygous SNPs
Conclusions
●SSAHA_pileup is effective across both new and
 old sequencing technologies
●Questions
  – When is a SNP not a SNP?
  – Homozygous/Heterozygous SNPs
●Length matters...?
  – But it's what you do with it that counts
Obtaining SSAHA_pileup
                 SSAHA_pileup:
    ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/

                    SSAHA2:
http://www.sanger.ac.uk/Software/analysis/SSAHA2/
                   These Slides:
             http://slideshare.net/bpb/

Más contenido relacionado

Destacado

Osmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring ToolOsmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring Toolosmius
 
B A U T I S M O2
B A U T I S M O2B A U T I S M O2
B A U T I S M O2gloriaysela
 
E X P O R T A N D O M I S D I B U J O S
E X P O R T A N D O  M I S  D I B U J O SE X P O R T A N D O  M I S  D I B U J O S
E X P O R T A N D O M I S D I B U J O SYrianat
 
Abschlusspräsentation
AbschlusspräsentationAbschlusspräsentation
AbschlusspräsentationHerr_Poffo
 
Carnaval de San Diego
Carnaval de San DiegoCarnaval de San Diego
Carnaval de San Diegoguest990cbb
 
Musica1eso
Musica1esoMusica1eso
Musica1esocarloshc
 
Colonus - rock -
Colonus - rock -Colonus - rock -
Colonus - rock -colonusrock
 
La France 2140 C O N T E X T O
La  France 2140 C O N T E X T OLa  France 2140 C O N T E X T O
La France 2140 C O N T E X T Olosdehinojosos
 
Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2umystic
 
аэг нов с домиками
аэг нов с домикамиаэг нов с домиками
аэг нов с домикамиVictor Gridnev
 

Destacado (20)

Osmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring ToolOsmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring Tool
 
Day Two
Day TwoDay Two
Day Two
 
Internet
InternetInternet
Internet
 
B A U T I S M O2
B A U T I S M O2B A U T I S M O2
B A U T I S M O2
 
E X P O R T A N D O M I S D I B U J O S
E X P O R T A N D O  M I S  D I B U J O SE X P O R T A N D O  M I S  D I B U J O S
E X P O R T A N D O M I S D I B U J O S
 
Grabalo
GrabaloGrabalo
Grabalo
 
Abschlusspräsentation
AbschlusspräsentationAbschlusspräsentation
Abschlusspräsentation
 
Cuento 1
Cuento 1Cuento 1
Cuento 1
 
Mashuta Mashuta
Mashuta MashutaMashuta Mashuta
Mashuta Mashuta
 
Carnaval de San Diego
Carnaval de San DiegoCarnaval de San Diego
Carnaval de San Diego
 
Mellorconhumor
MellorconhumorMellorconhumor
Mellorconhumor
 
Musica1eso
Musica1esoMusica1eso
Musica1eso
 
flickr + slide + animoto
flickr + slide + animotoflickr + slide + animoto
flickr + slide + animoto
 
Abusoinfantil
AbusoinfantilAbusoinfantil
Abusoinfantil
 
Quase
QuaseQuase
Quase
 
Colonus - rock -
Colonus - rock -Colonus - rock -
Colonus - rock -
 
La France 2140 C O N T E X T O
La  France 2140 C O N T E X T OLa  France 2140 C O N T E X T O
La France 2140 C O N T E X T O
 
Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2
 
Sesion 05 WinForm
Sesion 05 WinFormSesion 05 WinForm
Sesion 05 WinForm
 
аэг нов с домиками
аэг нов с домикамиаэг нов с домиками
аэг нов с домиками
 

Similar a SSAHA_pileup

Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3iainj88
 
Genotype Imputation via Matrix Completion
Genotype Imputation via Matrix CompletionGenotype Imputation via Matrix Completion
Genotype Imputation via Matrix Completionechi99
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesChirag Jain
 
NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09Anthony Salvagno
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analysesfnothaft
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for educationaryajayakottarathil
 
Photomorphogenesis talk
Photomorphogenesis talkPhotomorphogenesis talk
Photomorphogenesis talkHugh Shanahan
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingIntegrated DNA Technologies
 
Fly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelFly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelSanju K. Sinha
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 

Similar a SSAHA_pileup (20)

Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3
 
Genotype Imputation via Matrix Completion
Genotype Imputation via Matrix CompletionGenotype Imputation via Matrix Completion
Genotype Imputation via Matrix Completion
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
CQNCER
CQNCERCQNCER
CQNCER
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequences
 
NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 
Photomorphogenesis talk
Photomorphogenesis talkPhotomorphogenesis talk
Photomorphogenesis talk
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editing
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Fly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelFly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov model
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 

Último

NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreNZSG
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOne Monitar
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
BAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptxBAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptxran17april2001
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryWhittensFineJewelry1
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...ssuserf63bd7
 
WSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfWSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfJamesConcepcion7
 
Supercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsSupercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsGOKUL JS
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersPeter Horsten
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Anamaria Contreras
 
Send Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSendBig4
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdfShaun Heinrichs
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...Operational Excellence Consulting
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdfChris Skinner
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamArik Fletcher
 

Último (20)

NAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors DataNAB Show Exhibitor List 2024 - Exhibitors Data
NAB Show Exhibitor List 2024 - Exhibitors Data
 
Jewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource CentreJewish Resources in the Family Resource Centre
Jewish Resources in the Family Resource Centre
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring CapabilitiesOnemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
Onemonitar Android Spy App Features: Explore Advanced Monitoring Capabilities
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
BAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptxBAILMENT & PLEDGE business law notes.pptx
BAILMENT & PLEDGE business law notes.pptx
 
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold JewelryEffective Strategies for Maximizing Your Profit When Selling Gold Jewelry
Effective Strategies for Maximizing Your Profit When Selling Gold Jewelry
 
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
Horngren’s Financial & Managerial Accounting, 7th edition by Miller-Nobles so...
 
WSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdfWSMM Media and Entertainment Feb_March_Final.pdf
WSMM Media and Entertainment Feb_March_Final.pdf
 
Supercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebsSupercharge Your eCommerce Stores-acowebs
Supercharge Your eCommerce Stores-acowebs
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
EUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exportersEUDR Info Meeting Ethiopian coffee exporters
EUDR Info Meeting Ethiopian coffee exporters
 
Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.Traction part 2 - EOS Model JAX Bridges.
Traction part 2 - EOS Model JAX Bridges.
 
Send Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.comSend Files | Sendbig.com
Send Files | Sendbig.comSend Files | Sendbig.com
 
1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf1911 Gold Corporate Presentation Apr 2024.pdf
1911 Gold Corporate Presentation Apr 2024.pdf
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
The McKinsey 7S Framework: A Holistic Approach to Harmonizing All Parts of th...
 
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptxThe Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
The Bizz Quiz-E-Summit-E-Cell-IITPatna.pptx
 
20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf20200128 Ethical by Design - Whitepaper.pdf
20200128 Ethical by Design - Whitepaper.pdf
 
Technical Leaders - Working with the Management Team
Technical Leaders - Working with the Management TeamTechnical Leaders - Working with the Management Team
Technical Leaders - Working with the Management Team
 

SSAHA_pileup

  • 1. SSAHA_pileup: A Genome Variation Detection Pipeline for Various Sequencing Platforms Photo Credit: saynine on flickr.com Ben Blackburne Wellcome Trust Sanger Institute
  • 2. Acknowledgments ●Zemin Ning ●Yong Gu ●Antony Cox ●Adam Spargo ●Hannes Ponstingl
  • 3. Introduction ●New sequencing technologies – More data – Different kinds of data ●Solexa, 454 ●capillary, too – Diploid genomes – SNPs, indels, VNTRs Photo Credit: mknowles on flickr.com
  • 4.
  • 5.
  • 6.
  • 7. SSAHA_pileup ●Sequence Search and Alignment by Hashing Algorithm ●SSAHA_SNP – Global positioning with SSAHA algorithm – Fast Smith-Waterman implementation (from Cross_Match) – Identification of best match ●SSAHA_pileup – Determines SNPs from set of best alignments ●Works on Solexa, 454, and capillary reads
  • 8. The Toolchain Reference Genome SSAHA_snp/ Alignments SSAHA_pileup SSAHA2 variations Reads refinement
  • 9. SSAHA_SNP ●Reference genome is “hashed” – table made of all k-mer words – overlapping or not, at user's option
  • 10. SSAHA_SNP ●k-mer matches found for query in reference chr n chr m
  • 11. SSAHA_SNP chr n Global Mapping chr m
  • 12. SSAHA_SNP chr n score: 126 Local Mapping (Smith-Waterman) score: 113 chr m
  • 13. SSAHA_SNP chr n score: 126 Select best match score: 113 chr m
  • 14. SSAHA_SNP ●Read pair information – currently possible with extra step using SSAHA2 – being integrated into SSAHA_SNP – Removes incorrectly mapped pairs Photo Credit: Matthew Fang on flickr.com
  • 15. SSAHA_pileup Reference Genome SSAHA_snp/ Alignments SSAHA_pileup SSAHA2 variations Reads refinement
  • 16. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACGGAGCTGGAG CCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Homozygous SNP
  • 17. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACAGAGCTGGAG CCACAGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Heterozygous SNP
  • 18. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACAGAGCTGGAG CCACAGAGCTGGAGAAAGCCT TCCCACggagCTGGAGAAAGCCT TCCCACggagcTGGAGAAAGCCT TCCCacggagcTGGAGAAAGCCT Aligned reads Heterozygous SNP?? (Probably not)
  • 19. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCAC-----TGGAG CCAC-----TGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Heterozygous indel
  • 20. How well does it work?
  • 21. Datasets ●Venter: ABI capillary reads – Celera: 19,397,599 55% in pairs – JCVI: 12,541,352 98% in pairs – Total: 31,938,951 72% in pairs (90% mapped) ●Watson: 454 GS FLX reads – Baylor & Roche 74,198,831 (90.5% mapped) – single end reads with length 150 – 280 bps ●Chromosome X Illumina reads – 278,557,156 reads (71.6% mapped) – (paired with insert size 200bps)
  • 25. How do we even know if we are winning?
  • 26.
  • 28. Filtering ●Processes that cause bogus SNPs – Incorrect global mapping – Incorrect local alignment – Poor quality reads – Sequence amplification errors
  • 29. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` ` ` ` ` ` `` ` ` `` ` chr m
  • 30. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` ` ` ` ` ` `` ` ` `` ` chr m
  • 31. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` `` ` ` ` ` ` ` ` `` `
  • 32. SNPs
  • 33. Solution: Filter out SNPs called from abnormally high read depths
  • 34. Global Mapping Problems ●Incorrectly aligned reads chr n ` score: 132 ` score: 136 chr m
  • 35. Solution: nd Filter out SNPs where 2 best score is too close
  • 36. Local Alignment Problems ●Misalignment – Uncaught incorrect global alignment – Variations in short repeats
  • 37. Local Misalignment Reference ...GGTCCCACAGAGCTGGAGAAAA... GGTCCCACT---CTAGTG CCACT---CTAGTGAAAA TCCCACT---CTAGTGAAAA Aligned reads Real SNPs?
  • 38. Local Misalignment Reference ..TAATAATAATAATAATAATAAGAAG.. AATAATAAGAAGAAGAAGAAGAAG AATAATAAGAAGAAGAAGAAGAAG AATAATAAGAAGAAGAAGAAGAAG Aligned reads Real SNPs?
  • 39. Solution: Filter out short blocks of many SNPs
  • 40. Venter SNP Calling (Capillary) count fraction in dbSNP Homozygous SNPs 1 347 806 97.1% Heterozygous SNPs 1 857 167 90.9% Total SNPs 3 204 973 93.5%
  • 41. Watson SNP Calling (454) count fraction in dbSNP Homozygous SNPs 1 298 309 93.0% Heterozygous SNPs 1 767 951 63.9% Total SNPs 3 066 260 76.3%
  • 42. X Chromosome SNPs (Solexa) count fraction in dbSNP Homozygous SNPs 27 708 92.8% Heterozygous SNPs 63 197 81.8% Total SNPs 90 905 85.1%
  • 43. Venter-Watson Overlap 1 593 791 1 611 182 1 455 078 Venter Watson
  • 44. X Chromosome Overlap Solexa X reads 40 625 19 978 12 590 17 712 26 502 6 588 22 872 Venter Watson
  • 45. Conclusions ●SSAHA_pileup is effective across both new and old sequencing technologies ●Questions – When is a SNP not a SNP? – Homozygous/Heterozygous SNPs
  • 46. Conclusions ●SSAHA_pileup is effective across both new and old sequencing technologies ●Questions – When is a SNP not a SNP? – Homozygous/Heterozygous SNPs ●Length matters...? – But it's what you do with it that counts
  • 47. Obtaining SSAHA_pileup SSAHA_pileup: ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/ SSAHA2: http://www.sanger.ac.uk/Software/analysis/SSAHA2/ These Slides: http://slideshare.net/bpb/