Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Making Use of NGS Data: From Reads to Trees and Annotations

1.156 visualizaciones

Publicado el

Talk at ASM Microbe Boston 2016 - WS 24 Workshop - NGS for Microbial Genomics Surveillance and more - One Technology Fits All - 8.15- 4:15 PM

Publicado en: Ciencias
  • Inicia sesión para ver los comentarios

  • Sé el primero en recomendar esto

Making Use of NGS Data: From Reads to Trees and Annotations

  1. 1. João André Carriço, PhD Microbiology Institute/Institute for Molecular Medicine Faculty of Medicine, University of Lisbon Portugal http://im.fm.ul.pt http://imm.fm.ul.pt http://www.joaocarrico.info WORKSHOP 24: NGS FOR MICROBIAL GENOMIC SURVEILLANCE AND MORE - ONE TECHNOLOGY FITS ALL
  2. 2. Nothing to disclose
  3. 3.  This presentation is not intended to cover all available software or databases (we would need several weeks or months to do that)  I’ll present what I use or intend to use in a near future  I gladly accept any suggestions to included on similar presentations in the future.  It is supposed to be interactive so ask away during the presentation.
  4. 4.  What is in the reads FASTQ files  Available Databases  Virulence Factors and AMR DBs  Sequence-based typing databases: Pubmlst.org / Enterobase  HighThroughput Sequencing data analysis (freeware)  Prokka  Roary  Nullabor  Microreact.org  PHYLOViZ  Commercial Solutions  Bionumerics 7.5  CLC GenomicsWorkbench (CLC Bio)  Ridom Seqsphere+
  5. 5. Isolate Genome* Sequenced Reads Slide Source: Nick Loman Other isolates in the sequencing run Contamination * Chromosome + Plasmids + Phages
  6. 6. Virulence Factor Databases  VFDB (http://www.mgc.ac.cn/VFs/main.htm)  Pathosystems Resource Integration Center (PATRIC) VF (https)://www.patricbrc.org/)  Victors (http://www.phidias.us/victors/)  PHI-Base (http://www.phi-base.org/)  MvirDB (http://mvirdb.llnl.gov/ ) To know more: - Presentation on the Controversies in interpreting whole genome sequence data session : http://eccmidlive.org/#resources/how-can-we-design-actionable-virulome-databases
  7. 7.  Comprehensive Antibiotic Resistance Database (CARD) (https://card.mcmaster.ca/)  Repository of Antibiotic resistanceCassetes (RAC)(http://rac.aihi.mq.edu.au/rac/)  Integrall :The integron database (http://integrall.bio.ua.pt/) (…)
  8. 8. http://www.pubmlst.org http://bigsdb.web.pasteur.fr/
  9. 9. slide by @happy_khan Martin Sergeant Mark Achtman Nabil-Fareed Alikhan Zhemin Zhou
  10. 10. To know more : http://www.slideshare.net/nickloman/eccmid-2015-so-i-have-sequenced-my-genome-what-now Reads (fastq files) contigs (fasta files) Annotated contigs (gbk/gff files) Roary :PanGenome Analysis Enterobase BIGSdb Nullabor PHYLOViZ: Tree + metada visualization Microreact.org: Tree +metadata +vizualization Prokka De novo assembler
  11. 11.  Genome annotation made easy byTorsten Seemann (slides byTorsten)  Genome annotation: adding biological information to the sequence, by describing features To know more : http://www.slideshare.net/torstenseemann/prokka-rapid-bacterial-genome-annotation-abphm-2013 Available at: https://github.com/tseemann/prokka
  12. 12.  Pan genome analysis by Andrew Page  Available at: https://sangerpathogens.github.io/Roary/ Core genome Accessory genome Pan-genome
  13. 13.  Inputs:Annotated de novo assemblies (GFF files) • Typically from the annotation pipeline  Outputs: • Spreadsheet with presence and absence of genes • Multi-FASTA alignment of core genes so you can build a tree without a reference • Multi-FASTA alignments for each gene • Plots for the open/closed genome, unique genes • Integrates with Phandango so you can visualise all structural variation • QC report from Kraken to help identify suspect samples (Slide by Andrew Page)
  14. 14. Core (n or n-1 strains) Soft-Core (n-2 or n-3 strains) Shell ( 8(?) to n-3 strains) Cloud ( <8 (?) strains) Core genome: Core + Soft-Core Accessory genome: Shell + Cloud
  15. 15. iCANDY output of presence and absence of genes in accessory genome. S. Weltevreden & public S. enterica genomes (Slide by Andrew Page)
  16. 16.  Complete pipeline from reads to reports byTorsten Seemann  Objective is automate analysis for everyday use on public health labs /research settings  Uses and distills outputs by a lot of software  Avaliable at: https://github.com/tseemann/nullarbor
  17. 17. Slide byTorsten Seeman
  18. 18. From: https://github.com/tseemann/nullarbor
  19. 19. Slides byTorsten Seeman
  20. 20. www.phyloviz.net
  21. 21. Inputs: - Tab separated txt (profiles) - Fasta files - Automatic database retrieval (MLST) Outputs: • goeBURST and goeBURST MST • Link quality assessment • High quality images Can be easily applied to: - MLST/ cgMLST/wgMLST - MLVA - SNP data* - Gene Presence/absence
  22. 22. New features: • Hierarchical clustering • Neighbor-Joining • Project Saving
  23. 23.  Available at http://online.phyloviz.net  Web based version of PHYLOViZ  Allows users to create their own datasets, save them and share their data (privately or publicly)  REST API available  Scalable to thousands of nodes  Tree Analysis tools:  Interactive distance matrix  NLV graph
  24. 24. Slide by @happy_khan
  25. 25. NLV Graph Tree cut-off Full MST
  26. 26. Create Selections Change tree options
  27. 27.  Available at http://microreact.org/  Presentation on session Harnessing whole genome sequence data for public health applications : Novel open access tools forWGS- based pathogen surveillance and the identification of high-risk clones  http://eccmidlive.org/#resources/novel-open-access-tools-for- wgs-based-pathogen-surveillance-and-the-identification-of-high- risk-clones
  28. 28. • Ridom Seqsphere+ : http://www.ridom.de/seqsphere/ • Applied Maths Bionumerics 7.6: http://www.applied-maths.com/bionumerics • CLCBioGenomicWorkbench : http://www.clcbio.com/blog/clc-genomics-workbench-7-5/
  29. 29. • Huge variety of software and database solutions • There is no single One-Size-Fits-All solution (job security for bioinformaticians) • Different questions require different approaches • Always question the results and data provenance
  30. 30.  ECCMID2015 Meet-the-expert session on “What bioinformatic tools should I use for analysis of HighThroughput Sequencing data for molecular diagnostics? ”  Nick Loman: http://www.slideshare.net/nickloman/eccmid-2015- meettheexpert-bioinformatics-tools  João André Carriço: http://www.slideshare.net/joaoandrecarrico/eccmid-meet- theexpert2015
  31. 31.  UMMI Members  Bruno Gonçalves  Mário Ramirez  José Melo-Cristino  INESC-ID  Alexandre Francisco  Cátia Vaz  Marta Nascimento  EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/)  Mirko Rossi  FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/):  Dag Harmsen (Univ. Muenster)  Stefan Niemann (Research Center Borstel)  Keith Jolley, James Bray and Martin Maiden (Univ. Oxford)  Joerg Rothganger (RIDOM)  Hannes Pouseele (Applied Maths)  Genome Canada IRIDA project (www.irida.ca)  Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar(NLM , PHAC)  Ed Taboada and Peter Kruczkiewicz (LabFoodborne Zoonoses, PHAC)  Fiona Brinkman (SFU)  William Hsiao (BCCDC) INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS

×