SlideShare a Scribd company logo
1 of 38
Download to read offline
Snippy
Torsten Seemann
Balti & Bioinformatics - Birmingham, UK - Tue 5 May 2015
Rapid bacterial variant calling
& core genome alignments
Background
(Far) south east England
Phyloflagomics
UK / Birmingham Australia / Victoria Canada / British Columbia
A new home
Centre for Applied
Microbial Genomics
Microbiological Diagnostic Unit
∷ Oldest public health lab in Australia
: established 1897 in Melbourne
: large historical isolate collection back to 1950s
∷ National reference laboratory
: Salmonella, Listeria, EHEC
∷ WHO regional reference lab
: vaccine preventable invasive bacterial pathogens
New director
∷ Professor Ben Howden
: clinician, microbiologist, pathologist
: early adopter of genomics and bioinformatics
: long term collaborator on MRSA/VRE w/ Tim Stinear
∷ Mandate
: modernise service delivery
: enhance research output and collaboration
: nationally lead the conversion to WGS
Hardware
∷ Sequencers
: NextSeq 500
: 3 x MiSeq
: PacBio RS II (arriving 22 May)
∷ Robots
: Perkin Elmer (does not have a Twitter account)
: Colony picker
∷ Compute
: 240 TB, 10 GigE, 3 x 72 core boxes
Variant calling
Variant calling
∷ Find DNA differences between genomes
: variants to explain phenotype
: validate your complemented mutant
∷ Two approaches
: reference based (read alignment)
: reference-free (de novo assembly / k-mer based)
Types of variants
∷ Substitutions
: single nucleotide polymorphism (snp) A➝C
: multiple nucleotide polymorphism (mnp) AG➝TC
∷ Indels
: insertion (ins) A➝AC
: deletion (del) ACCG➝AG
∷ Complex
: compound events AC➝T
My solution
Snippy
∷ Fast → snappy
∷ Finds variants → SNPs
∷ Australian → Skippy the bush kangaroo
Input
∷ FASTQ files
: paired end, interleaved, or single-end
∷ Reference
: FASTA or Genbank
∷ Output folder
: self contained bundle of results
Inside the black box
∷ bwa mem - no clipping needed
∷ samtools - sorted, filtered BAM
∷ freebayes - split / GNU parallel / merge
∷ vcflib/vcftools - VCF filtering
∷ perl - glue
Outputs
∷ Read alignments
: .bam / .bai
∷ Variants
: .vcf / .vcf.gz / .vcf.gz.tbi / .gff .bed .tab .csv .html
∷ Consensus
: reference with all variants applied to it
∷ Genome alignment
: reference with “-” (missing) and “N” low depth
TAB output
CHROM POS TYPE REF ALT EVIDENCE FTYPE STRAND NT_POS AA_POS LOCUS_TAG GENE PRODUCT
chr 5958 snp A G G:44 A:0 CDS + 41/600 13/200 ECO_0001 dnaA replication protein
DnaA
chr 35524 snp G T T:73 G:1 C:1 tRNA -
chr 45722 ins ATT ATTT ATTT:43 ATT:1 CDS - ECO_0045 gyrA DNA gyrase
chr 100541 del CAAA CAA CAA:38 CAAA:1 CDS + ECO_0179 hypothetical protein
plas 619 complex GATC AATA GATC:28 AATA:0
plas 3221 mnp GA CT CT:39 CT:0 CDS + ECO_p012 rep hypothetical protein
Phylogenomics
Phylogenetics 101
∷ Choose some genes
∷ Sequence each gene from each isolate
∷ Align the protein sequences of each gene
∷ Back-align to nucleotide space
∷ Concatenate all the alignments
∷ Construct a distance matrix (many ways)
∷ Draw a tree (many ways)
∷ Make wild inferences from little data
Phylogenomics 101
∷ Assemble each genome
∷ Perform whole genome alignment
: in nucleotide space, as don’t know what is coding
: very computationally expensive
: can’t parallelize as with individual genes
∷ Continue as for phylogenetics
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
∷ Ideally, feed this directly to a tree builder
∷ Properly model gaps, codons and ambiguity
∷ Hard!
Whole genome alignment
Core genome SNPs
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
core | | ||||||||| ||||||
Core sites are present in all genomes.
Core genome
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
core | | ||||||||| ||||||
SNPs | | | | |
Core SNPS = polymorphic sites in core genome
Core SNPs
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
core | | ||||||||| ||||||
SNPs | | | | |
SNPs’ | | | |
Unambiguous core SNPs
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
SNPs’ | | | |
ata ttc ata atg
1 2 3 4
Allele sites
>bug1
ATAA
>bug2
TTTT
>bug3
ACAG
Alignment ⇢Tree
+------ bug3
|
---+--- bug1
|
+--------- bug2
--- 1 SNP
The N±1 problem
Aligning to reference
∷ Why is whole genome alignment not used?
: involves genome (mis)assembly
: computationally difficult
: expensive to add or remove isolates
∷ Short-cut
: choose a single reference
: align each isolates reads to the reference
: core, by definition, must include the reference
Read mapping considerations
∷ Choice of reference
∷ Too divergent?
: reads may not align well
: will get too many core genome SNPs
∷ One solution
: Assemble one isolate and use as the reference
SNPs | | | | |
core | | ||||||||| ||||||
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
core1 ||| ||||||||||| ||||||||||
SNPs1 | | || |
Remove taxon, different core (1)
SNPs | | | | |
core | | ||||||||| ||||||
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
core2 | | ||||||||| ||||||
SNPs2 | | | | |
Remove taxon, different core (2)
SNPs | | | | |
core | | ||||||||| ||||||
bug1 GATTACCAGCATTAAGG-TTCTCCAATC
bug2 GAT---CTGCATTATGGATTCRNCATTC
bug3 G-TTACCAGCACTAA-------CCAGTC
core3 | ||||||||||||| ||||||
SNPs3 | |
Remove taxon, different core (3)
Core genome alignments
∷ Core SNP alignments
: can shift dramatically with taxa content
: we are only using globally conserved sites
: remember variation still exists outside “core”
∷ Snippy will keep the full alignments
: quickly derive subsets on the fly
: adding isolates can be done quickly too
Conclusion
Snippy summary
∷ The good
: Fast, scales to 100 cores
: Simple, clean interface and output
∷ The bad
: Doesn’t do full consequences yet using snpEff
∷ The ugly?
: Written in Perl
Contact
∷ tseemann.github.io
∷ github.com/tseemann/snippy
∷ @torstenseemann

More Related Content

What's hot

Nanopore for dna sequencing by shreya
Nanopore for dna sequencing by shreyaNanopore for dna sequencing by shreya
Nanopore for dna sequencing by shreyaShreya Modi
 
Cycle de developpement du Plasmodium: devenir des sporozoïtes
Cycle de developpement du Plasmodium: devenir des sporozoïtesCycle de developpement du Plasmodium: devenir des sporozoïtes
Cycle de developpement du Plasmodium: devenir des sporozoïtesInstitut Pasteur de Madagascar
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur PipelineEman Abdelrazik
 
The preservation of bacterial and fungal strains
The preservation of bacterial and fungal strainsThe preservation of bacterial and fungal strains
The preservation of bacterial and fungal strainsDr. Ishan Y. Pandya
 
Revised Gram Staining
Revised Gram StainingRevised Gram Staining
Revised Gram Stainingscuffruff
 
CRISPR-CAS System: From Adaptive Immunity To Genome editing
CRISPR-CAS System: From Adaptive Immunity To Genome editingCRISPR-CAS System: From Adaptive Immunity To Genome editing
CRISPR-CAS System: From Adaptive Immunity To Genome editingDebanjan Pandit
 
Mécanismes de Résistance de Plasmodium aux Quinoléiques
Mécanismes de Résistance de Plasmodium aux QuinoléiquesMécanismes de Résistance de Plasmodium aux Quinoléiques
Mécanismes de Résistance de Plasmodium aux QuinoléiquesInstitut Pasteur de Madagascar
 
Automation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAutomation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAyman Allam
 

What's hot (13)

Nanopore for dna sequencing by shreya
Nanopore for dna sequencing by shreyaNanopore for dna sequencing by shreya
Nanopore for dna sequencing by shreya
 
Blastomycetes 1[1]
Blastomycetes 1[1]Blastomycetes 1[1]
Blastomycetes 1[1]
 
Cycle de developpement du Plasmodium: devenir des sporozoïtes
Cycle de developpement du Plasmodium: devenir des sporozoïtesCycle de developpement du Plasmodium: devenir des sporozoïtes
Cycle de developpement du Plasmodium: devenir des sporozoïtes
 
16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline16S rRNA Analysis using Mothur Pipeline
16S rRNA Analysis using Mothur Pipeline
 
PCR Types
PCR TypesPCR Types
PCR Types
 
The preservation of bacterial and fungal strains
The preservation of bacterial and fungal strainsThe preservation of bacterial and fungal strains
The preservation of bacterial and fungal strains
 
Revised Gram Staining
Revised Gram StainingRevised Gram Staining
Revised Gram Staining
 
CRISPR-CAS System: From Adaptive Immunity To Genome editing
CRISPR-CAS System: From Adaptive Immunity To Genome editingCRISPR-CAS System: From Adaptive Immunity To Genome editing
CRISPR-CAS System: From Adaptive Immunity To Genome editing
 
Pna fish by dr prabhash
Pna fish by dr prabhashPna fish by dr prabhash
Pna fish by dr prabhash
 
Mécanismes de Résistance de Plasmodium aux Quinoléiques
Mécanismes de Résistance de Plasmodium aux QuinoléiquesMécanismes de Résistance de Plasmodium aux Quinoléiques
Mécanismes de Résistance de Plasmodium aux Quinoléiques
 
Microbiology-Phage Typing
Microbiology-Phage  TypingMicrobiology-Phage  Typing
Microbiology-Phage Typing
 
Automation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challengesAutomation in microbiology, changing concept and defeating challenges
Automation in microbiology, changing concept and defeating challenges
 
Em pcr 16x9
Em pcr 16x9Em pcr 16x9
Em pcr 16x9
 

Viewers also liked

Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Torsten Seemann
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Torsten Seemann
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Torsten Seemann
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Torsten Seemann
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015Torsten Seemann
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Torsten Seemann
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Torsten Seemann
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and afterLex Nederbragt
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014Torsten Seemann
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Torsten Seemann
 
How to write bioinformatics software people will use and cite - t.seemann - ...
How to write bioinformatics software people will use and cite -  t.seemann - ...How to write bioinformatics software people will use and cite -  t.seemann - ...
How to write bioinformatics software people will use and cite - t.seemann - ...Torsten Seemann
 
Hand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional predictionHand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional predictionJonathan Eisen
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomicsdparks1134
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesScott Edmunds
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015Torsten Seemann
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...Torsten Seemann
 

Viewers also liked (20)

Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
Visualizing the pan genome - Australian Society for Microbiology - tue 8 jul ...
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Slides5
Slides5Slides5
Slides5
 
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
Decoding our bacterial overlords - Melbourne Knowledge Week - tue 28 oct 2014
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
 
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
A peek inside the bioinformatics black box - DCAMG Symposium - mon 20 july 2015
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
 
2015 12-09 nmdd
2015 12-09 nmdd2015 12-09 nmdd
2015 12-09 nmdd
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
Assembly: before and after
Assembly: before and afterAssembly: before and after
Assembly: before and after
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
How to write bioinformatics software people will use and cite - t.seemann - ...
How to write bioinformatics software people will use and cite -  t.seemann - ...How to write bioinformatics software people will use and cite -  t.seemann - ...
How to write bioinformatics software people will use and cite - t.seemann - ...
 
Hand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional predictionHand drawn slides for talk for #PSB17 on Evolution and functional prediction
Hand drawn slides for talk for #PSB17 on Evolution and functional prediction
 
Parks kmer metagenomics
Parks kmer metagenomicsParks kmer metagenomics
Parks kmer metagenomics
 
Ngs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challengesNgs de novo assembly progresses and challenges
Ngs de novo assembly progresses and challenges
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
 

Similar to Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
Jan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnJan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnGenomeInABottle
 
Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015Caelie Kern
 
SNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionSNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionAffymetrix
 
Assay Development in Digital PCR
Assay Development in Digital PCRAssay Development in Digital PCR
Assay Development in Digital PCRKirsten Copren
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014LutzFr
 
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...Thermo Fisher Scientific
 
DNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its functionDNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its functionSubhadipGhosh96
 

Similar to Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015 (20)

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
Jan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincolnJan2015 using the pilot genome rm for clinical validation steve lincoln
Jan2015 using the pilot genome rm for clinical validation steve lincoln
 
Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015Presentation for Phi Sigma Fall 2015
Presentation for Phi Sigma Fall 2015
 
SNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionSNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping Solution
 
Assay Development in Digital PCR
Assay Development in Digital PCRAssay Development in Digital PCR
Assay Development in Digital PCR
 
cloning
cloningcloning
cloning
 
cloning
cloningcloning
cloning
 
Cloning
CloningCloning
Cloning
 
C:\fakepath\cloning
C:\fakepath\cloningC:\fakepath\cloning
C:\fakepath\cloning
 
Cloning
CloningCloning
Cloning
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
Evaluation of ctDNA extraction methods and amplifiable copy number yield usin...
 
DNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its functionDNASeq and basis structure of Dna and its function
DNASeq and basis structure of Dna and its function
 

More from Torsten Seemann

How to write bioinformatics software no one will use
How to write bioinformatics software no one will useHow to write bioinformatics software no one will use
How to write bioinformatics software no one will useTorsten Seemann
 
Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Torsten Seemann
 
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...Torsten Seemann
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015Torsten Seemann
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Torsten Seemann
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
 

More from Torsten Seemann (6)

How to write bioinformatics software no one will use
How to write bioinformatics software no one will useHow to write bioinformatics software no one will use
How to write bioinformatics software no one will use
 
Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016Snippy - T.Seemann - Poster - Genome Informatics 2016
Snippy - T.Seemann - Poster - Genome Informatics 2016
 
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...Sequencing your poo with a usb stick -  Linux.conf.au 2016 miniconf  - mon 1 ...
Sequencing your poo with a usb stick - Linux.conf.au 2016 miniconf - mon 1 ...
 
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015Long read sequencing -  WEHI  bioinformatics seminar - tue 16 june 2015
Long read sequencing - WEHI bioinformatics seminar - tue 16 june 2015
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
 

Recently uploaded

ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 

Recently uploaded (20)

Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 

Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015

  • 1. Snippy Torsten Seemann Balti & Bioinformatics - Birmingham, UK - Tue 5 May 2015 Rapid bacterial variant calling & core genome alignments
  • 4. Phyloflagomics UK / Birmingham Australia / Victoria Canada / British Columbia
  • 5. A new home Centre for Applied Microbial Genomics
  • 6. Microbiological Diagnostic Unit ∷ Oldest public health lab in Australia : established 1897 in Melbourne : large historical isolate collection back to 1950s ∷ National reference laboratory : Salmonella, Listeria, EHEC ∷ WHO regional reference lab : vaccine preventable invasive bacterial pathogens
  • 7. New director ∷ Professor Ben Howden : clinician, microbiologist, pathologist : early adopter of genomics and bioinformatics : long term collaborator on MRSA/VRE w/ Tim Stinear ∷ Mandate : modernise service delivery : enhance research output and collaboration : nationally lead the conversion to WGS
  • 8. Hardware ∷ Sequencers : NextSeq 500 : 3 x MiSeq : PacBio RS II (arriving 22 May) ∷ Robots : Perkin Elmer (does not have a Twitter account) : Colony picker ∷ Compute : 240 TB, 10 GigE, 3 x 72 core boxes
  • 10. Variant calling ∷ Find DNA differences between genomes : variants to explain phenotype : validate your complemented mutant ∷ Two approaches : reference based (read alignment) : reference-free (de novo assembly / k-mer based)
  • 11. Types of variants ∷ Substitutions : single nucleotide polymorphism (snp) A➝C : multiple nucleotide polymorphism (mnp) AG➝TC ∷ Indels : insertion (ins) A➝AC : deletion (del) ACCG➝AG ∷ Complex : compound events AC➝T
  • 13. Snippy ∷ Fast → snappy ∷ Finds variants → SNPs ∷ Australian → Skippy the bush kangaroo
  • 14. Input ∷ FASTQ files : paired end, interleaved, or single-end ∷ Reference : FASTA or Genbank ∷ Output folder : self contained bundle of results
  • 15. Inside the black box ∷ bwa mem - no clipping needed ∷ samtools - sorted, filtered BAM ∷ freebayes - split / GNU parallel / merge ∷ vcflib/vcftools - VCF filtering ∷ perl - glue
  • 16. Outputs ∷ Read alignments : .bam / .bai ∷ Variants : .vcf / .vcf.gz / .vcf.gz.tbi / .gff .bed .tab .csv .html ∷ Consensus : reference with all variants applied to it ∷ Genome alignment : reference with “-” (missing) and “N” low depth
  • 17. TAB output CHROM POS TYPE REF ALT EVIDENCE FTYPE STRAND NT_POS AA_POS LOCUS_TAG GENE PRODUCT chr 5958 snp A G G:44 A:0 CDS + 41/600 13/200 ECO_0001 dnaA replication protein DnaA chr 35524 snp G T T:73 G:1 C:1 tRNA - chr 45722 ins ATT ATTT ATTT:43 ATT:1 CDS - ECO_0045 gyrA DNA gyrase chr 100541 del CAAA CAA CAA:38 CAAA:1 CDS + ECO_0179 hypothetical protein plas 619 complex GATC AATA GATC:28 AATA:0 plas 3221 mnp GA CT CT:39 CT:0 CDS + ECO_p012 rep hypothetical protein
  • 19.
  • 20. Phylogenetics 101 ∷ Choose some genes ∷ Sequence each gene from each isolate ∷ Align the protein sequences of each gene ∷ Back-align to nucleotide space ∷ Concatenate all the alignments ∷ Construct a distance matrix (many ways) ∷ Draw a tree (many ways) ∷ Make wild inferences from little data
  • 21. Phylogenomics 101 ∷ Assemble each genome ∷ Perform whole genome alignment : in nucleotide space, as don’t know what is coding : very computationally expensive : can’t parallelize as with individual genes ∷ Continue as for phylogenetics
  • 22. bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC ∷ Ideally, feed this directly to a tree builder ∷ Properly model gaps, codons and ambiguity ∷ Hard! Whole genome alignment
  • 24. bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC core | | ||||||||| |||||| Core sites are present in all genomes. Core genome
  • 25. bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC core | | ||||||||| |||||| SNPs | | | | | Core SNPS = polymorphic sites in core genome Core SNPs
  • 26. bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC core | | ||||||||| |||||| SNPs | | | | | SNPs’ | | | | Unambiguous core SNPs
  • 27. bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC SNPs’ | | | | ata ttc ata atg 1 2 3 4 Allele sites
  • 30. Aligning to reference ∷ Why is whole genome alignment not used? : involves genome (mis)assembly : computationally difficult : expensive to add or remove isolates ∷ Short-cut : choose a single reference : align each isolates reads to the reference : core, by definition, must include the reference
  • 31. Read mapping considerations ∷ Choice of reference ∷ Too divergent? : reads may not align well : will get too many core genome SNPs ∷ One solution : Assemble one isolate and use as the reference
  • 32. SNPs | | | | | core | | ||||||||| |||||| bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC core1 ||| ||||||||||| |||||||||| SNPs1 | | || | Remove taxon, different core (1)
  • 33. SNPs | | | | | core | | ||||||||| |||||| bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC core2 | | ||||||||| |||||| SNPs2 | | | | | Remove taxon, different core (2)
  • 34. SNPs | | | | | core | | ||||||||| |||||| bug1 GATTACCAGCATTAAGG-TTCTCCAATC bug2 GAT---CTGCATTATGGATTCRNCATTC bug3 G-TTACCAGCACTAA-------CCAGTC core3 | ||||||||||||| |||||| SNPs3 | | Remove taxon, different core (3)
  • 35. Core genome alignments ∷ Core SNP alignments : can shift dramatically with taxa content : we are only using globally conserved sites : remember variation still exists outside “core” ∷ Snippy will keep the full alignments : quickly derive subsets on the fly : adding isolates can be done quickly too
  • 37. Snippy summary ∷ The good : Fast, scales to 100 cores : Simple, clean interface and output ∷ The bad : Doesn’t do full consequences yet using snpEff ∷ The ugly? : Written in Perl