Using Snippy to call variants in bacterial short read datasets via alignment to reference, and then using these alignments to produce core SNP alignments for phylogenomics.
6. Microbiological Diagnostic Unit
∷ Oldest public health lab in Australia
: established 1897 in Melbourne
: large historical isolate collection back to 1950s
∷ National reference laboratory
: Salmonella, Listeria, EHEC
∷ WHO regional reference lab
: vaccine preventable invasive bacterial pathogens
7. New director
∷ Professor Ben Howden
: clinician, microbiologist, pathologist
: early adopter of genomics and bioinformatics
: long term collaborator on MRSA/VRE w/ Tim Stinear
∷ Mandate
: modernise service delivery
: enhance research output and collaboration
: nationally lead the conversion to WGS
8. Hardware
∷ Sequencers
: NextSeq 500
: 3 x MiSeq
: PacBio RS II (arriving 22 May)
∷ Robots
: Perkin Elmer (does not have a Twitter account)
: Colony picker
∷ Compute
: 240 TB, 10 GigE, 3 x 72 core boxes
10. Variant calling
∷ Find DNA differences between genomes
: variants to explain phenotype
: validate your complemented mutant
∷ Two approaches
: reference based (read alignment)
: reference-free (de novo assembly / k-mer based)
20. Phylogenetics 101
∷ Choose some genes
∷ Sequence each gene from each isolate
∷ Align the protein sequences of each gene
∷ Back-align to nucleotide space
∷ Concatenate all the alignments
∷ Construct a distance matrix (many ways)
∷ Draw a tree (many ways)
∷ Make wild inferences from little data
21. Phylogenomics 101
∷ Assemble each genome
∷ Perform whole genome alignment
: in nucleotide space, as don’t know what is coding
: very computationally expensive
: can’t parallelize as with individual genes
∷ Continue as for phylogenetics
30. Aligning to reference
∷ Why is whole genome alignment not used?
: involves genome (mis)assembly
: computationally difficult
: expensive to add or remove isolates
∷ Short-cut
: choose a single reference
: align each isolates reads to the reference
: core, by definition, must include the reference
31. Read mapping considerations
∷ Choice of reference
∷ Too divergent?
: reads may not align well
: will get too many core genome SNPs
∷ One solution
: Assemble one isolate and use as the reference
35. Core genome alignments
∷ Core SNP alignments
: can shift dramatically with taxa content
: we are only using globally conserved sites
: remember variation still exists outside “core”
∷ Snippy will keep the full alignments
: quickly derive subsets on the fly
: adding isolates can be done quickly too
37. Snippy summary
∷ The good
: Fast, scales to 100 cores
: Simple, clean interface and output
∷ The bad
: Doesn’t do full consequences yet using snpEff
∷ The ugly?
: Written in Perl