"Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by National Institute for Standards and Technology October 2014 by Rebecca Lindsey, PhD from Enteric Diseases Laboratory Branch of the CDC.
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens
1. Next Generation Sequencing for
Identification and Subtyping
of Foodborne Pathogens
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Rebecca Lindsey, PhD
Enteric Diseases Laboratory Branch
NIST Workshop October 20, 2014
2. Advanced Molecular Detection (AMD) Initiative
http://www.cdc.gov/amd/
• Projects to transform Networks, programs and
systems – 8 CDC projects
• EDLB- Transforming public health microbiology with whole genome
sequencing for foodborne diseases (Salmonella, Shiga toxin-
producing Escherichia coli (STEC), and Campylobacter)
• Projects Using AMD for Specific Pathogens – 15
CDC projects
• EDLB- Maximizing the potential of real-time whole genome sequence-
based Listeria surveillance to solve outbreaks and improve food safety
No CDC consensus on how to use
WGS for identification
3.
4. Collaborating Partners
• Collaboration among the public health departments
in the states, FDA, USDA, and NCBI
• International component: Developing and refining
bioinformatics ‘pipelines’ with partners
in Belgium, Canada, Denmark, England, and France
Public Health Agency of Canada
5. Vision
for the use of WGS in the surveillance of foodborne illness
WGS is used to characterize foodborne pathogens in
public health laboratories, replacing multiple
workflows with one single efficient workflow
TAT: (2-) 3- 4 days
6. Current Methods of Characterizing Foodborne
Pathogens in a Public Health Laboratory
• Growth characteristics
• Phenotypic panels
• Agglutination reactions
• Enzyme immuno assays (EIAs)
• PCR
• DNA arrays (hybridization)
• Sanger sequencing
• DNA restriction
• Electrophoresis (PFGE, capillary)
• Each pathogen is characterized by methods that are specific to
that pathogen in multiple workflows
- Separate workflows for each pathogen
- TAT: 5 min – weeks (months)
7. Why Move Public Health
Microbiology to WGS?
Besides consolidation of workflows in the labs:
• More efficient outbreak detection, investigation & control
• Precise and flexible case definition
– More outbreaks will be detected and solved when they are
small
– Scarce epi-resources may be focused
• More efficient surveillance of sporadic infections
• Source attribution analysis of sporadic disease
• Focus on pathogens of particular public health
importance:
– Virulence – Resistance - Emerging pathogens - Rapidly
spreading clones/ traits- Vaccine preventable diseases
8. WGS in Public Health:
The tools must be
• Simple
• Public health microbiologists are NOT
bioinformaticians
• Standard desktop software
• Comprehensive
• All characterization in one workflow
• Work in a network of laboratories
• Free sharing and comparison of data between labs
• Central and local databases
9. To SNP or Not to SNP?
in public health
• Single Nucleotide Polymorphism (SNP) approaches
• Default for phylogenetic analyses of sequence data
• Comparative subtyping by nature
• Results difficult to communicate
• Computationally intensive = SLOW
• Gene- gene approach (wgMLST)
• Definitive subtyping
• Leads to naming, tracking over time, easy communication
• Computationally more simple = FAST but…
• Sufficiently discrimination?
• YES!
11. Standardization of
Methods
• Standard Operating Procedures- CLIA
certification- in EDLB
• Recommended protocols in state labs
• Sequencing quality metrics
– Qvalues – vary by machine
– Coverage – for upload to NCBI
• 20X Listeria, Campylobacter
• 30X Salmonella
• 40X STEC/Shigella
Salmonella www.cdc.gov/amd
12. NGS Standards in Progress for Clinical Labs
• The College of American Pathologists (CAP) –NGS
molecular pathology
- includes 18 laboratory accreditation checklist requirements for
the analytic “wet bench” process and “dry lab” bioinformatics
analysis processes (Aziz et al 2014).
• National Next-generation Sequencing Standardization
of Clinical Testing (Nex-StoCT) workgroup.
- developed guidelines to ensure that results from tests based
on NGS are reliable and useful for clinical decision making
(Gargis et al 2013).
• All labs submitting NGS to CLIA labs will have to
follow CLIA protocols
14. BioNumerics
• A powerful combined database and analytical
software package
– A ‘one tool fits all’ application for public health
• Highly customizable
• Used by PulseNet, CaliciNet and CryptoNet
– The public health labs are familiar with it
15. Gene – Gene Approach
• Fixed set of genes (‘loci’) leading to typing schemes
on different levels
• Concept of allelic variation, not only point mutations
• Evolutionary distance for events such as recombination
and simultaneous close-range mutations are counted as
one event
• Definitive subtyping
• Leads to nomenclature
• Requires curation
eMLST cMLST wgMLST
MLST
Genus/Species
Serotype
AR
16. Genes That May Be Targeted In a
Gene-Gene Analytical Approach
Core (c) genes (‘present
in all strains in a species’)
Housekeeping genes for MLST & eMLST
Serotyping genes
Genes for genus/species/subspecies
identification
Virulence genes
Antimicrobial resistance
genes
Pan- genome (wg) (‘all
genes in the whole
population of a species’)
17. Public Health WGS Workflow
Nomenclature server
Calculation engine
Trimming, mapping, de novo
assembly, SNP detection, allele
detection
SQL databases
End users at
CDC and in
the States
Allele databases
External storage
NCBI, ENA, BaseSpace
Sequencer
Genus/species
Serotype
Pathotype
Virulence profile
AST
Lineage
Clone
Sequence type
Allele
Raw sequences
LIMS
18. Public Health WGS Workflow
Nomenclature server
Calculation engine
Trimming, mapping, de novo
assembly, SNP detection, allele
detection
SQL databases
End users at
CDC and in
the States
Allele databases
External storage
NCBI, ENA, BaseSpace
Sequencer
Genus/species
Serotype
Pathotype
Virulence profile
AST
Lineage
Clone
Sequence type
Allele
Raw sequences
LIMS
19. The Nomenclatural Server in
the WGS Workflow
• A database with all genes and gene variants (‘alleles’)
• Function of most genes not known
but
• Genes used for reference characterization are also included
• E.g., genus/species identification, serotyping, pathotyping, virulence
characterization, antimicrobial resistance, MLST
• Alleles detected by the calculation engine are identified and NAMED
• New alleles are added to the database automatically
• Ambiguous alleles are forwarded to database managers and organism
specific SME’s for curation/confirmation before being added
Building the nomenclatural
database is an international
collaborative effort
Should ultimately be placed in
public domain
20. Building species specific allele
data bases - wgMLST
• Listeria
- 200 annotated reference genomes
- 5800 unique loci
• Campylobacteraceae
– 100 annotated reference genomes
– current BIGSdb
• Shiga toxin-producing E. coli
- 60 annotated reference genomes
- E. coli databases
21. - ResFinder
-VirulenceFinder
-SerotypeFinder
O target = wzy,
wzx, wzm and wzt
H target = flic, flka,
flla, flma and flna
Zankari E, et al., J Antimicrob
Chemother. 2012. 67(11):2640-4.
Joensen KG, et al.J. Clin.
Micobiol. 2014. 52(5): 1501-1510.
23. Public Health WGS Workflow
Nomenclature server
Calculation engine
Trimming, mapping, de novo
assembly, SNP detection, allele
detection
SQL databases
End users at
CDC and in
the States
Allele databases
External storage
NCBI, ENA, BaseSpace
Sequencer
Genus/species
Serotype
Pathotype
Virulence profile
AST
Lineage
Clone
Sequence type
Allele
Raw sequences
LIMS
24. The Calculation Engine in the
WGS Workflow
• Current: Closed - OID
Bioinformatics Core
• Potential: Public - In ‘the
cloud’ for the global public
health community
• Computationally intensive
sequence trimming,
mapping, de novo assembly,
SNP detection, allele
detection
• Slow - but a ‘one-time’
process
Calculation engine
28. Gene – Gene Approach for Naming
Subtyping in Keep with Phylogeny
(concept to be developed)
eMLST cMLST wgMLST7 gene MLST
Isolate A ST24 - e12 - c48 - w214
Isolate B ST24 - e12 - c48 - w352
Isolate C ST24 - e12 - c45 - w132
Isolate D ST31 - e15 - c60 - w582
Isolate A and B closely related
Isolate C related to A and B but not as closely as A is to B
Isolate D unrelated to all the other isolates
Providing phylogenetic information in the name is important because isolates from the
same source are more likely to be related than isolates from different sources
29. PATHOTYPE: Shiga toxin producing and Enteroaggregative E. coli (STEC & EaggEC)
VIRULENCE PROFILE: stx2a, aagR, aagA, sigA, sepA, pic, aatA, aaiC, aap
SEQUENCE TYPE: ST34
ANTIMICROBIAL RESISTANCE GENES: blaTEM-1 , blaCTX-M-15
The strain contains Shiga toxin subtype 2a typically associated with virulent STEC
It does not contain adherence and virulence factors (eae, ehxA) typically associated with virulent STEC
It contains adherence and virulence factors typically associated with virulent EaggEc (aagR, aagA, sigA, sepA,
pic, aatA, aaiC, aap)
This genotype is associated with extremely high (>10%) rates of hemolytic uremic syndrome (HUS)
All characteristics have been determined by whole genome sequencing (WGS)
GENUS/SPECIES:
30. Conclusion: Standardization of WGS
Public Health Microbiology
• No CDC consensus among the many
different organisms
• Standardization of NGS following
CAP/CLIA guidelines.
• Standardization among collaborators
-- Methods
-- Analysis
-- Nomenclature
31. Acknowledgements
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Disclaimers:
“The findings and conclusions in this presentation are those of the author and do not necessarily
represent the official position of the Centers for Disease Control and Prevention”
“Use of trade names is for identification only and does not imply endorsement by the Centers for
Disease Control and Prevention or by the U.S. Department of Health and Human Services.”
Public Health Agency of Canada
CDC: Heather Carleton, Eija Trees, Peter Gerner-Smidt, Collette Leaumont, Efrain
Ribot, Lee Katz, Nancy Strockbine
32. Questions?
For more information please contact Centers for Disease Control and Prevention
Enteric Diseases Laboratory Branch
1600 Clifton Road NE, Atlanta, GA 30333
The findings and conclusions in this report are those of the authors and do not necessarily represent the
official position of the Centers for Disease Control and Prevention.
Notas del editor
There is no consensus at CDC for any of the above.
All labs submitting to the reference labs will have to be CLIA certified. IN the past state labs could have Molecular Pulsenet and Reference labs, now all will have to be CLIA certified so if they are all conducting NGS and they are all CLIA, they may streamline the labs into one lab. Working on EDLB and PulseNet standardized protocols which are recommended to states. Working towards CLIA certification of all steps in the process.
Need Sequencing quality metrics – Qvalues vary MiSeq vs. NextSeq vs. Pgem. For PulseNet and EDLB reference labs we have coverage recommendations.
W
High quality standard reference genomes that have been annotated would be helpful for hqSNP as well as building databases for wgMLST. Working for CLIA certification of all steps in the process.
well characterized annotated reference genomes
Pac-bio sequencing still working on the STEC database
More high quality genomes would be useful
There is no consensus at CDC for any of the above.
Want to be able to automatically name a pattern
There is no consensus at CDC for any of the above.