SlideShare una empresa de Scribd logo
1 de 26
IMMEM XI
Navigating Microbial Genomes:
Insights from the Next Generation
9 – 12 March 2016, Estoril, Portugal
2
Whole Genome Sequencing
 Suddenly cheap and easy
 Huge amounts of data generated in Canada & globally
 Can solve many problems
 Resolution
 Breadth of strains typed
 Scale of data brings its own problems
 Pangenome definitions
 Variable assembly completeness and quality
 Existing typing systems don't scale well
3
Classical MLST
 Looks at allelic diversity of ~7 “housekeeping” loci
 All loci must be fully present
 Each new allele is a type
 Recombination and mutation are equivalent
 Each unique combination of types is a Sequence Type
 Type definitions are universal
 Centralized and curated
 e.g. ST-21 in Canada = ST-21 in UK = ST-21 in Denmark
Dingle, et al. 2001. J. Clin. Micro. 39(1) 14-23
4
 The core genome is shared by all members of the species; mostly SNP-level
genetic variation
 Accessory genes are not shared by all members of the species and drive a
lot of the phenotypic variability between strains
What is a “Core gene”? What about a “Core genome”?
5
Core Genome MLST
 Logical extension of Classical MLST concepts
 7 genes → 100s or 1000s of genes
 Potential successor “Gold Standard” typing method for surveillance
 Big Advantages
 High Resolution
 Viable way for WGS → Surveillance
 Lots of interest in cgMLST
cgMLST analysis of 200 isolates “identical” by MLST
7
Walkerton outbreak 2000
cgMLST analysis of 200 isolates “identical” by MLST
8
Aprototype cgMLST scheme for C. jejuni
 2690 Campylobacter jejuni whole genome sequence assemblies
 Set of 1,658 ORFs from reference strain NCTC11168 used as queries
 85% sequence identity & 50% length coverage
 732 ORFs conserved across all genomes  core genome loci
9
cgMLST Trials and Tribulations
 2690 Campylobacter jejuni whole genome sequence assemblies
 Allele definitions gathered from all genomes
Not so simple!
 WGS projects don't usually finish their
genomes
 “Genome Assemblies”
 Target loci are often truncated by
chance
 Only 1464 genomes (54%) had
complete sequences at all 732 loci
10
Contig Truncations are a function of genome count
As the number of genomes analyzed is increased, the probability that
any locus will have at least one truncation approaches 100%
 Average rate of missing/truncated loci ≈ 3.5%
 26 per assembly!
11
Contig Truncations are a function of locus count
 Average rate of missing/truncated loci ≈ 3.5%
 26 per assembly!
As the number of loci analyzed is increased, the probability that at
least one genome will have a truncation increases to 100%
12
The Story So Far...
 Advantages of cgMLST
1. Analysis is cheap and speedy
2. Hugely improved resolution
3. Consistent, portable nomenclature
 Difficulties Introduced by cgMLST
 Missing / Truncated Loci will affect your
scheme
 As-is, forces you to sacrifice either #1 or #3:
Re-sequence and re-assemble and hope it works
– or –
Abandon all hope for portability
13
Some options for damage control!
1. Use only highly conserved core genes
2. Use optimized gene fragments
3. Reduce the number of target loci
4. Attempt to impute data
14
Some options for damage control!
1. Use only highly conserved core genes
2. Use optimized gene fragments
3. Reduce the number of target loci
4. Attempt to impute data
15
Using Optimized Gene Fragments
• The longer the target sequence, then more opportunities for truncations
16
Using Optimized Gene Fragments
• The longer the target sequence, then more opportunities for truncations
• Avoid regions with empirically high contig truncation rates
17
Using Optimized Gene Fragments
• The longer the target sequence, then more opportunities for truncations
• Avoid regions with empirically high contig truncation rates
• Retain the most informative regions  Measured by Shannon Entropy
18
Using Optimized Gene Fragments
• The longer the target sequence, then more opportunities for truncations
• Avoid regions with empirically high contig truncation rates
• Retain the most informative regions  Measured by Shannon Entropy
• Optimized sub-regions that are informative and truncation-free
19
Some options for damage control!
1. Use only highly conserved core genes
2. Use optimized gene fragments
3. Reduce the number of target loci
4. Attempt to impute data
20
How many loci do we need for accurate clustering?
Pristine Genome Set
 732 cgMLST loci
 1,464 aforementioned genomes
 A controlled development
environment for cgMLST testing
Clustering
 Reference set clustered at various
similarity thresholds
 100% - 20% similarity
 0.5% steps
21
 Random Gene Selection
 N genes randomly selected from the 732
 1000 replicates each
 Clusters compared vs the full 732
 Comparison to “reference tree”
 Adjusted Wallace Coefficient
 Compares clusters produced by two methods
 “How often do two strains clustered together by Method A cluster
together by Method B”
How many loci do we need for accurate clustering?
22
Random Subset Clusters – 5th Percentile (i.e.“worst case scenario”)
150-250 genes are nearly as good as 732 genes
0.0 0.2 0.4 0.6 0.8
23
Some options for damage control!
1. Use only highly conserved core genes
2. Use optimized gene fragments
3. Reduce the number of target loci
4. Attempt to impute data
Allele Imputation: Another Approach
5 21???
• Inferring the allele of a missing/partial
locus
• Educated guess from the allele proportions
of 'centres' known to be associated with
particular 'flanks‘
• Mean accuracy of 90.5%
• Further refinement with partial sequence
data
Conclusions
• cgMLST is poised to be the Gold Standard for global surveillance of
bacterial pathogens
• Contig truncations and missing data become a blocking problem if the
same portability of typing definitions as MLST is desired
• A compromise between typability and robustness is required
• Contig truncations’ effect can be mitigated by :
• The worst fragments of genes (truncation & information content)
• The genes that contribute the least to discriminatory power
• “Filling the gaps” with advance knowledge about linkage
• Supervisors:
• Drs. Ed Taboada & Jim Thomas
• Labmates:
• Steven Mutschall (PHAC)
• Peter Krucziewicz (PHAC)
• Ben Hetman (PHAC/ULeth)
• Cody Buchanan (CFIA/ULeth)
• Funding
• ESCMID Attendance Grant
• University of Lethbridge
• Public Health Agency of Canada
• Government of Canada Genomics Research and
Development Initiative
Acknowledgements

Más contenido relacionado

La actualidad más candente

Human genome, genetic mapping, cloning, and cryonics
Human genome, genetic mapping, cloning, and cryonicsHuman genome, genetic mapping, cloning, and cryonics
Human genome, genetic mapping, cloning, and cryonics
Eemlliuq Agalalan
 
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Chi-Ping Day
 
Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...
Joseph Pickrell
 

La actualidad más candente (20)

Genome editing, luxturna
Genome editing, luxturnaGenome editing, luxturna
Genome editing, luxturna
 
Genome sequencing and undiagnosed disease
Genome sequencing and undiagnosed diseaseGenome sequencing and undiagnosed disease
Genome sequencing and undiagnosed disease
 
Gene mapping methods
Gene mapping methodsGene mapping methods
Gene mapping methods
 
Human genome, genetic mapping, cloning, and cryonics
Human genome, genetic mapping, cloning, and cryonicsHuman genome, genetic mapping, cloning, and cryonics
Human genome, genetic mapping, cloning, and cryonics
 
Genomics seminar copy
Genomics seminar   copyGenomics seminar   copy
Genomics seminar copy
 
Bio Informatics - Genome Assembly
Bio Informatics - Genome AssemblyBio Informatics - Genome Assembly
Bio Informatics - Genome Assembly
 
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
Emerging Clinical Applications of CRISPR-Cas9 as Promising Strategies in Gene...
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
 
Single nucleotide polymorphism
Single nucleotide polymorphismSingle nucleotide polymorphism
Single nucleotide polymorphism
 
genome mapping
genome mappinggenome mapping
genome mapping
 
Cell authentication by str profile
Cell authentication by str profileCell authentication by str profile
Cell authentication by str profile
 
Molecular biology
Molecular biologyMolecular biology
Molecular biology
 
Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...Understanding mechanisms underlying human gene expression variation with RNA ...
Understanding mechanisms underlying human gene expression variation with RNA ...
 
Supporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi RehmSupporting Genomics in the Practice of Medicine by Heidi Rehm
Supporting Genomics in the Practice of Medicine by Heidi Rehm
 
Human genome
Human genomeHuman genome
Human genome
 
Hgp info
Hgp infoHgp info
Hgp info
 
Genomics
GenomicsGenomics
Genomics
 
Pcr based gene cloning
Pcr based gene cloningPcr based gene cloning
Pcr based gene cloning
 
ویرایش ژنوم Genome editing tools
ویرایش ژنوم Genome editing toolsویرایش ژنوم Genome editing tools
ویرایش ژنوم Genome editing tools
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 

Destacado

NATURE vs MANKIND who is more destructive ?
NATURE vs MANKIND who is more destructive ?NATURE vs MANKIND who is more destructive ?
NATURE vs MANKIND who is more destructive ?
Ks Maheta
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
Julie Smith
 
Study on Thinning
Study on Thinning Study on Thinning
Study on Thinning
RMwebsite
 

Destacado (20)

2D Thinning
2D Thinning2D Thinning
2D Thinning
 
NATURE vs MANKIND who is more destructive ?
NATURE vs MANKIND who is more destructive ?NATURE vs MANKIND who is more destructive ?
NATURE vs MANKIND who is more destructive ?
 
Anolis heterodermus
Anolis heterodermusAnolis heterodermus
Anolis heterodermus
 
Crime Scene
Crime Scene Crime Scene
Crime Scene
 
Image Segmentation Chain
Image Segmentation ChainImage Segmentation Chain
Image Segmentation Chain
 
Sujet 4 - CARTE GRAPHIQUE
Sujet 4 - CARTE GRAPHIQUESujet 4 - CARTE GRAPHIQUE
Sujet 4 - CARTE GRAPHIQUE
 
The Sioux Who Wrestled a Ghost
The Sioux Who Wrestled a GhostThe Sioux Who Wrestled a Ghost
The Sioux Who Wrestled a Ghost
 
Curriculum Vitae
Curriculum VitaeCurriculum Vitae
Curriculum Vitae
 
Israel pp
Israel ppIsrael pp
Israel pp
 
Gen epio immem_griffiths
Gen epio immem_griffithsGen epio immem_griffiths
Gen epio immem_griffiths
 
Ethonomics israel
Ethonomics israelEthonomics israel
Ethonomics israel
 
grisi
grisigrisi
grisi
 
Study on Thinning
Study on Thinning Study on Thinning
Study on Thinning
 
Workouts | Fox News Magazine
Workouts | Fox News MagazineWorkouts | Fox News Magazine
Workouts | Fox News Magazine
 
Emma Food on workshop allergy_eg
Emma Food on workshop allergy_egEmma Food on workshop allergy_eg
Emma Food on workshop allergy_eg
 
Emma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_posterEmma Griffiths ASM microbe gen_epio_poster
Emma Griffiths ASM microbe gen_epio_poster
 
Amazon SNS+SQSによる Fanoutシナリオの話
Amazon SNS+SQSによる Fanoutシナリオの話Amazon SNS+SQSによる Fanoutシナリオの話
Amazon SNS+SQSによる Fanoutシナリオの話
 
Technology versus Cancer (How can technology help?)
Technology versus Cancer (How can technology help?)Technology versus Cancer (How can technology help?)
Technology versus Cancer (How can technology help?)
 
Bios
BiosBios
Bios
 
resume c.v. new updated.
resume c.v. new updated.resume c.v. new updated.
resume c.v. new updated.
 

Similar a Barker immemxi final March 2016

Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
MohamedHasan816582
 
Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)
Zohaib HUSSAIN
 
3 genetics syllabus statements
3 genetics syllabus statements3 genetics syllabus statements
3 genetics syllabus statements
cartlidge
 

Similar a Barker immemxi final March 2016 (20)

Overview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seqOverview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seq
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
 
QIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene Panels
 
Lecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generationLecture bioinformatics Part2.next generation
Lecture bioinformatics Part2.next generation
 
Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
Seftah DNA fingerprint 2007NEW.ppt
Seftah DNA fingerprint 2007NEW.pptSeftah DNA fingerprint 2007NEW.ppt
Seftah DNA fingerprint 2007NEW.ppt
 
Listeria monocytogenes from population structure to genomic epidemiology
Listeria monocytogenes from population structure to genomic epidemiologyListeria monocytogenes from population structure to genomic epidemiology
Listeria monocytogenes from population structure to genomic epidemiology
 
Tilling and eco tilling
Tilling and eco tillingTilling and eco tilling
Tilling and eco tilling
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies2007. stephen chanock. technologic issues in gwas and follow up studies
2007. stephen chanock. technologic issues in gwas and follow up studies
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomes
 
Genotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary dataGenotyping, linkage mapping and binary data
Genotyping, linkage mapping and binary data
 
Genotyping With PCR
Genotyping With PCRGenotyping With PCR
Genotyping With PCR
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
3 genetics syllabus statements
3 genetics syllabus statements3 genetics syllabus statements
3 genetics syllabus statements
 

Más de IRIDA_community

Más de IRIDA_community (12)

Robertson immemxi final March 2016
Robertson immemxi final March 2016Robertson immemxi final March 2016
Robertson immemxi final March 2016
 
Hetman immem xi final March 2016
Hetman immem xi final March 2016Hetman immem xi final March 2016
Hetman immem xi final March 2016
 
Emma FoodON poster3
Emma FoodON poster3Emma FoodON poster3
Emma FoodON poster3
 
Biocuration gen epio_poster
Biocuration gen epio_posterBiocuration gen epio_poster
Biocuration gen epio_poster
 
Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016Julie Shay CCBC poster may 11 2016
Julie Shay CCBC poster may 11 2016
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
 
Report Calc for Quality Control
Report Calc for Quality ControlReport Calc for Quality Control
Report Calc for Quality Control
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Irida bccdc dec10_2015
Irida bccdc dec10_2015Irida bccdc dec10_2015
Irida bccdc dec10_2015
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiaoIRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
IRIDA: Canada’s federated platform for genomic epidemiology, ABPHM 2015 WHsiao
 
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance ExperienceDomselaar GMI8 Beijing Canadian WGS Surveillance Experience
Domselaar GMI8 Beijing Canadian WGS Surveillance Experience
 

Último

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Silpa
 

Último (20)

Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 

Barker immemxi final March 2016

  • 1. IMMEM XI Navigating Microbial Genomes: Insights from the Next Generation 9 – 12 March 2016, Estoril, Portugal
  • 2. 2 Whole Genome Sequencing  Suddenly cheap and easy  Huge amounts of data generated in Canada & globally  Can solve many problems  Resolution  Breadth of strains typed  Scale of data brings its own problems  Pangenome definitions  Variable assembly completeness and quality  Existing typing systems don't scale well
  • 3. 3 Classical MLST  Looks at allelic diversity of ~7 “housekeeping” loci  All loci must be fully present  Each new allele is a type  Recombination and mutation are equivalent  Each unique combination of types is a Sequence Type  Type definitions are universal  Centralized and curated  e.g. ST-21 in Canada = ST-21 in UK = ST-21 in Denmark Dingle, et al. 2001. J. Clin. Micro. 39(1) 14-23
  • 4. 4  The core genome is shared by all members of the species; mostly SNP-level genetic variation  Accessory genes are not shared by all members of the species and drive a lot of the phenotypic variability between strains What is a “Core gene”? What about a “Core genome”?
  • 5. 5 Core Genome MLST  Logical extension of Classical MLST concepts  7 genes → 100s or 1000s of genes  Potential successor “Gold Standard” typing method for surveillance  Big Advantages  High Resolution  Viable way for WGS → Surveillance  Lots of interest in cgMLST
  • 6. cgMLST analysis of 200 isolates “identical” by MLST
  • 7. 7 Walkerton outbreak 2000 cgMLST analysis of 200 isolates “identical” by MLST
  • 8. 8 Aprototype cgMLST scheme for C. jejuni  2690 Campylobacter jejuni whole genome sequence assemblies  Set of 1,658 ORFs from reference strain NCTC11168 used as queries  85% sequence identity & 50% length coverage  732 ORFs conserved across all genomes  core genome loci
  • 9. 9 cgMLST Trials and Tribulations  2690 Campylobacter jejuni whole genome sequence assemblies  Allele definitions gathered from all genomes Not so simple!  WGS projects don't usually finish their genomes  “Genome Assemblies”  Target loci are often truncated by chance  Only 1464 genomes (54%) had complete sequences at all 732 loci
  • 10. 10 Contig Truncations are a function of genome count As the number of genomes analyzed is increased, the probability that any locus will have at least one truncation approaches 100%  Average rate of missing/truncated loci ≈ 3.5%  26 per assembly!
  • 11. 11 Contig Truncations are a function of locus count  Average rate of missing/truncated loci ≈ 3.5%  26 per assembly! As the number of loci analyzed is increased, the probability that at least one genome will have a truncation increases to 100%
  • 12. 12 The Story So Far...  Advantages of cgMLST 1. Analysis is cheap and speedy 2. Hugely improved resolution 3. Consistent, portable nomenclature  Difficulties Introduced by cgMLST  Missing / Truncated Loci will affect your scheme  As-is, forces you to sacrifice either #1 or #3: Re-sequence and re-assemble and hope it works – or – Abandon all hope for portability
  • 13. 13 Some options for damage control! 1. Use only highly conserved core genes 2. Use optimized gene fragments 3. Reduce the number of target loci 4. Attempt to impute data
  • 14. 14 Some options for damage control! 1. Use only highly conserved core genes 2. Use optimized gene fragments 3. Reduce the number of target loci 4. Attempt to impute data
  • 15. 15 Using Optimized Gene Fragments • The longer the target sequence, then more opportunities for truncations
  • 16. 16 Using Optimized Gene Fragments • The longer the target sequence, then more opportunities for truncations • Avoid regions with empirically high contig truncation rates
  • 17. 17 Using Optimized Gene Fragments • The longer the target sequence, then more opportunities for truncations • Avoid regions with empirically high contig truncation rates • Retain the most informative regions  Measured by Shannon Entropy
  • 18. 18 Using Optimized Gene Fragments • The longer the target sequence, then more opportunities for truncations • Avoid regions with empirically high contig truncation rates • Retain the most informative regions  Measured by Shannon Entropy • Optimized sub-regions that are informative and truncation-free
  • 19. 19 Some options for damage control! 1. Use only highly conserved core genes 2. Use optimized gene fragments 3. Reduce the number of target loci 4. Attempt to impute data
  • 20. 20 How many loci do we need for accurate clustering? Pristine Genome Set  732 cgMLST loci  1,464 aforementioned genomes  A controlled development environment for cgMLST testing Clustering  Reference set clustered at various similarity thresholds  100% - 20% similarity  0.5% steps
  • 21. 21  Random Gene Selection  N genes randomly selected from the 732  1000 replicates each  Clusters compared vs the full 732  Comparison to “reference tree”  Adjusted Wallace Coefficient  Compares clusters produced by two methods  “How often do two strains clustered together by Method A cluster together by Method B” How many loci do we need for accurate clustering?
  • 22. 22 Random Subset Clusters – 5th Percentile (i.e.“worst case scenario”) 150-250 genes are nearly as good as 732 genes 0.0 0.2 0.4 0.6 0.8
  • 23. 23 Some options for damage control! 1. Use only highly conserved core genes 2. Use optimized gene fragments 3. Reduce the number of target loci 4. Attempt to impute data
  • 24. Allele Imputation: Another Approach 5 21??? • Inferring the allele of a missing/partial locus • Educated guess from the allele proportions of 'centres' known to be associated with particular 'flanks‘ • Mean accuracy of 90.5% • Further refinement with partial sequence data
  • 25. Conclusions • cgMLST is poised to be the Gold Standard for global surveillance of bacterial pathogens • Contig truncations and missing data become a blocking problem if the same portability of typing definitions as MLST is desired • A compromise between typability and robustness is required • Contig truncations’ effect can be mitigated by : • The worst fragments of genes (truncation & information content) • The genes that contribute the least to discriminatory power • “Filling the gaps” with advance knowledge about linkage
  • 26. • Supervisors: • Drs. Ed Taboada & Jim Thomas • Labmates: • Steven Mutschall (PHAC) • Peter Krucziewicz (PHAC) • Ben Hetman (PHAC/ULeth) • Cody Buchanan (CFIA/ULeth) • Funding • ESCMID Attendance Grant • University of Lethbridge • Public Health Agency of Canada • Government of Canada Genomics Research and Development Initiative Acknowledgements