Computational Phage-Host Prediction

Shaman Narayanasamy
Eco-Systems Biology Group
Supervisors: Paul Wilmes and Jorge Goncalves
PHD-2014-1/7934898
Computational approaches to predict
bacteriophage-host relationships
Robert A. Edwards, Katelyn McNair, Karoline Faust, Jeroen Raes, Bas E. Dulith
Review Article FEMS Microbiology (9 December 2015)
Computational Biology Pizza Club series: 25th May 2016

2
Article overview
• Metagenomics for identification of viral-host associations
• Introduction of wet-lab methods
• Focused on bacteriophages (phages) and bacterial
interactions
• Benchmark data: 820 bacteriophages, associated hosts and
publicly available metagenomic datasets
• Assessment of predictive power of in silico phage-host
signals:
– Abundance-based methods
– Sequence homology based methods
– Genetic homology
– CRISPRs
– Oligonucleotide profiles
– Compositional based methods

4
Introduction
Infection!
Membrane
receptor
Figure adapted and modified from Gelbart & Knobler et al. (2008)

5
Introduction
Infection!
Resistance
Defense!!!
• Membrane receptor
mutation
• CRISPR-Cas
• Restriction-modification
Membrane
receptor

6
Introduction
Infection!
Resistance
Defense!!!
mutation
• CRISPR-Cas
Membrane
receptor
Mutation

7
Introduction
Infection!
Resistance Fitness
Defense!!!
mutation
• CRISPR-Cas
Membrane
receptor
Mutation

8
Introduction
Infection!
Resistance Fitness

9
Introduction
Infection!
Resistance Fitness

10
Introduction
Infection!
Resistance Fitness

11
Introduction
Competition
Infection!
Resistance Fitness

Experimental approaches for phage isolation
12
• Spot and plaque assays
• Liquid assays
• Viral tagging
• Microfluidic PCR
• PhageFISH
• Single cell sequencing
• Hi-C sequencing

Spot and plaque assays
13
Requires
• Pure culture of host
• Pure/environmental culture of phage
Disadvantages
• Low throughput
• Host isolation required
Photo adapted and modified from http://www.slideshare.net/Adrienna/global-food-safety2013

Liquid assays
14
Requires
• Pure culture of phage
Disadvantages
• Use of OD readout *
• Low sensitivity (single endpoint values) *
• Host and phage isolate required
* Use redox dye, Omnilog platform and real-time/semiquantitative PCR
Figure adapted and modified from Goldberg et al. (2014)

Viral tagging
15
Requires
• Pure culture/environmental isolate of phages
• Cell sorter (FACS..?)
Disadvantages
• Host isolate required
Figure adapted and modified from http://jgi.doe.gov/dyeing-learn-marine-viruses/

Microfludic PCR
16
Requires
• Environmental microbial community sample
• PCR primers for target marker genes
Disadvantages
• Relies on marker genes for design of PCR primers
Figure adapted and modified from Dang & Sullivan (2014)

PhageFISH
17
Figures adapted and modified from Dang & Sullivan (2014) and Allers et al. (2013)
Requires
• Environmental microbial community sample
• PCR primers for target marker genes
Disadvantages
• Relies on marker genes for FISH probe design
time

Single cell sequencing
18
Requires
• Single microbial cell from environmental microbial community sample
Disadvantages
• Biased towards most abundant environmental microbe
Figure adapted and modified from Lasken (2012)

Benchmark dataset
19
820
complete
phage
genomes
Field: “host”
153
complete
bacterial
genomes
NCBI
RefSeq

Quality assessment of predictions: ROC curves
20
• Assessment of binary classifier (Host/Not Host)
• Does not require cut-off value
• Based on the rate of accumulation of true and false positives
• True positive rate (Sensitivity), False positive rate (1-Specificity)
TPr = TP/TP + FN FPr = TN/TN + FP

Computational methods for phage-host signal
prediction
21
• Abundance profiles
• Genetic homology
• CRISPR
• Exact matches
• Oligonucleotide profiles

Abundance profiles
22
• Stern et al. (2012)
– Good correlation of phage-host abundance across human gut microbiome (metagenomes)
• Reyes et al. (2013)
– 2/5 phages correspond to decrease in host abundance (mouse gut)
• Nielsen et al. (2014)
– Occurrence of phage like gene sets corresponding to host (bacterial) gene set
– Includes known phage-host pairs
• Dulith et al. (2014)
• 22% metagenomic reads may be of phage origin
• Lima-Mendez et al. (2015); TARA Oceon Survey
Figure adapted and modified from Nielsen et al. (2014) and Edwards et al. (2015)
• Improves with the availability of multiple samples from same/similar environments
• High spatio/temporal stratification; will improve as publicly available metagenome collection increases
• Time series datasets potentially used for time lagged associations
• Complicated and non-linear dynamics incompatible with straightforward correlation
• 12% correct identification of host

Genetic homology
23
• Phage-host homology is an indication of recent common ancestry, implying interaction
• Host genes may benefit phages!
• Auxilary metabolic genes
• Modi et al. (2013) and Dulith et al . (2014)
Figure adapted and modified from Edwards et al. (2015)
• Amino acid based searches applicable for distantly related organisms (29.8%)
• Nucleotide based searches more accurate (38.5%)
• 30% host identified

24
CRISPR-Cas
Phage genome 2Phage genome 1
R R R RRRS1 S2 S5S3 S4
R: Repeat
Sx: Spacers
CRISPR
Bacterial genome cas gene CRISPR

CRISPRs
25
• Studies:
– Human gut microbiome; Stern et al. (2012), Minot et al. (2013)
– Acidophilic biofilms; Andersson & Banfield (2008)
– Cow rumen; Berg Miller et al. (2012)
– Arctic glacial ice and soil; Sanguino et al. (2015)
– Marines environments; Anderson, Brazelton & Baross (2011), Cassman et al. (2012)
– Activated sludge; Narayanasamy et al. (unpublished)
• Little to no homology to known sequence
• Environmentally dependent
• Spacers are rapidly replaced
• Most suitable for recent phage-host interactions
• Not all prokaryotes encode CRISPRs (bacteria; 48 ± 30%, archaea; 63 ± 30%)
• Highly specific, but not sensitive
• Degeneracy of up to 13 mismatches allowed (Fineran et al., 2014)

Exact matches
26
• Integration of phage to host via homologous recombination
• attp (POP’) on phage genome and attb (BOB’) on bacterial genome
• Common identical core sequence (2-15 bp) between phage and host
• Adjacent to integrase gene in phage genome, near tRNA gene in bacterial genomes
• Longer matches more reliable
• Up to 40% matches correct prediction

Contig with cas gene
Contig with known phage gene
Contig with CRISPR locus
Oligonucleotide profiles
27
• Phages ameliorate genomic oligonucleotides profiles according to host
• Avoid recognition by restriction enzymes
• Adjustment of codon usage to match available host tRNAs
• Ogilvie et al. (2013) identified 408 metagenomic fragments with phage like properties (4mers)
Figure adapted and modified from Narayanasamy et al. (unpublished) and Edwards et al. (2015)
• Profiles cannot be too sparse (shorter kmers)
• K=3-8 predicted 8-17% correct hosts
• Codon usage predicted ~10% hosts correctly
• GC content not informative

Summary and overview
28
Signal category Approach Performance Comments
Abundance profiles Phage-host coabundance
profiles
Association by correlation
9.5% non-linear dynamics
confound correlations
Genetic homology Phage-host nucleotide and
protein sequence
homology
38.5% - blastn
29.8% - blastx
Depends on database
CRISPRs Spacers alignments to
phage genomes
15.1% - most similar
21.3% - highest
Occurrence of CRISPR
system (~40% bacteria,
~70% archaea)
No matches
Not sensitive
Exact matches ** Exact matches of phage-
host genomes
40.5% Short exact matches
may be random
Oligonucleotide
profiles
Similarity of kmer profiles
of phage-host
17.2% - 4mer
10.4% - codon
Table adapted and modified from Edwards et al. (2015)

Summary and overview
29
• Blastn and exact matches provide strongest signal
• Most methods predict between 1 - 4 bacteria as most likely host (better than random)
• Significant host genome fraction required (except for abundance-based method)
• Current knowledge still limited
• Phage host range (highly specific vs brad range)
• New methods and technology

Computational Phage-Host Prediction

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (19)

Similar a Computational Phage-Host Prediction

Similar a Computational Phage-Host Prediction (20)

Último

Último (20)

Computational Phage-Host Prediction