2. Outline of today’s lecture
Introduction
Classification of Viruses
Diversity and Evolution of Viruses
Metagenomics and Virus Diversity
Bioinformatics Approaches to Problems in Virology
Influenza Virus
Herpesvirus: From Phylogeny to Gene Expression
Human Immunodeficiency Virus
Bioinformatic Approaches to HIV-1
Measles Virus
3. Learning objectives for today’s lecture
• Describe how viruses are classified
• Explain bioinformatics approaches to virology
• Describe the influenza virus genome including the new
H1N1 virus
• Provide a descriptio of the Herpesviruses
• Use NCBI and LANL resources to identify the function
and evolution of Human Immunodeficiency Virus (HIV-1)
4.
5. Viruses are small, infectious, obligate intracellular
parasites. They depend on host cells to replicate. Because
they lack the resources for independent existence, they
exist on the borderline of the definition of life.
The virion (virus particle) consists of a nucleic acid
genome surrounded by coat proteins (capsid) that may be
enveloped in a host-derived lipid bilayer.
Viral genomes consist of either RNA or DNA. They may be
single-, double, or partially double stranded. The genomes
may be circular, linear, or segmented.
Introduction to viruses
Page 567
6. Viruses have been classified by several criteria:
-- based on morphology (e.g. by electron microscopy)
-- by type of nucleic acid in the genome
-- by size (rubella is about 2 kb; HIV-1 about 9 kb;
poxviruses are several hundred kb). Mimivirus
(for Mimicking microbe) has a double-stranded
circular genome of 1.2 megabases (Mb).
-- based on human disease
Page 568
Introduction to viruses
8. Fig. 14.2
Page 570
The International Committee on Taxonomy of Viruses
(ICTV) offers a website, accessible via NCBI’s Entrez site
http://www.ncbi.nlm.nih.gov/ICTVdb/
9. Mimivirus is the sole member of the Mimiviridae family of
nucleocytoplasmic large DNA viruses (NCLDVs).
It was isolated from amoebae growing in England.
The mature particle has a diameter of ~400 nanometers,
comparable to a small bacterium (e.g. a mycoplasma).
Thus, mimivirus is by far the largest virus identified to
date.
Mimivirus: mimicking microbe
Page 569
10. The mimivirus genome is 1.2 Mb (1,181,404 base pairs). It
is a double-stranded DNA virus.
► Two inverted repeats of 900 base pairs at the ends
(thus it may circularize)
► 72% AT content (~28% GC content)
► 1262 putative open-reading frames (ORFs) of length
>100 amino acids. 911 of these are predicted to be
protein-coding genes
► Unique features include genes predicted to encode
proteins that function in protein translation. The inability to
perform protein synthesis has been considered a prime
feature of viruses, in contrast to most life forms.
See Raoult D et al. (2004) Science 306:1344.
Mimivirus: mimicking microbe
Page 569
11. Viral metagenomics refers to the sampling of
representative viral genomes from the environment.
A typical viral genome is ~50 kilobases (in comparison, a
typical microbial genome is ~2.5 megabases). A sample is
collected (e.g. seawater, fecal material, or soil). Cellular
material is excluded. Viral DNA is extracted, cloned, and
sequenced.
Viral metagenomics
Page 573
12. Comparison of viral metagenomic libraries to the GenBank non-redundant
database. Viral metagenomic sequences from human faeces, a marine
sediment sample and two seawater samples were compared to the
GenBank non-redundant database at the date of publication and in
December 2004. The percentage of each library that could be classified as
Eukarya, Bacteria, Archaea, viruses or showed no similarities (E-value
>0.001) is shown. Edwards RA, Rohwer F. Nature Reviews Microbiology 3,
13. Edwards RA, Rohwer F. Nature Reviews Microbiology 3, 504-510 (2005)
“The Phage Proteomic Tree is a whole-genome-based taxonomy system that
can be used to identify similarities between complete phage genomes and
metagenomic sequences. This new version of the tree contains 167 phage
genomes. Phages in black cannot be classified into any clade. In the key, each
phage is defined in a clockwise direction.”
14. Genomic overview of the uncultured viral community from human feces based
on TBLASTX sequence similarities. (A) Numbers of sequences with significant
matches (E values of <0.001) in GenBank. (B) Distribution of significant
matches among major classes of biological entities. (C) Types of mobile
elements recognized in the library. (D) Families of phages identified in the fecal
library.
Mya Breitbart M. et al. (2003) Metagenomic Analyses of an Uncultured Viral
Community from Human Feces. J Bacteriol. 185: 6220–6223.
15. Categories of phage proteins with significant matches in the
uncultured human fecal viral library
Mya Breitbart M. et al. (2003) Metagenomic Analyses of an Uncultured Viral
Community from Human Feces. J Bacteriol. 185: 6220–6223.
16. Vaccine-preventable viral diseases include:
Hepatitis A
Hepatitis B
Influenza
Measles
Mumps
Poliomyelitis
Rubella
Smallpox
Page 571
Human disease relevance of viruses
Source: Centers for Disease Control website
17. Disease Virus
Hepatitis A Hepatitis A virus
Hepatitis B Hepatitis B virus
Influenza Influenza type A or B
Measles Measles virus
Mumps Rubulavirus
Poliomyelitis Poliovirus (three serotypes)
Rotavirus Rotavirus
Rubella Genus Rubivirus
Smallpox Variola virus
Varicella Varicella-zoster virus
Page 571Source: Centers for Disease Control website
Human disease relevance of viruses
18. Outline of today’s lecture
Introduction
Classification of Viruses
Diversity and Evolution of Viruses
Metagenomics and Virus Diversity
Bioinformatics Approaches to Problems in Virology
Influenza Virus
Herpesvirus: From Phylogeny to Gene Expression
Human Immunodeficiency Virus
Bioinformatic Approaches to HIV-1
Measles Virus
19. Some of the outstanding problems in virology include:
-- Why does a virus such as HIV-1 infect one species
(human) selectively?
-- Why do some viruses change their natural host?
In 1997 a chicken influenza virus killed six people.
-- Why are some viral strains particularly deadly?
-- What are the mechanisms of viral evasion of the host
immune system?
-- Where did viruses originate?
Bioinformatic approaches to viruses
Page 574
20. The unique nature of viruses presents special challenges
to studies of their evolution.
• viruses tend not to survive in historical samples
• viral polymerases of RNA genomes typically lack
proofreading activity
• viruses undergo an extremely high rate of replication
• many viral genomes are segmented; shuffling may occur
• viruses may be subjected to intense selective pressures
(host immune respones, antiviral therapy)
• viruses invade diverse species
• the diversity of viral genomes precludes us from
making comprehensive phylogenetic trees of viruses
Diversity and evolution of viruses
Page 574
25. Outline of today’s lecture
Introduction
Classification of Viruses
Diversity and Evolution of Viruses
Metagenomics and Virus Diversity
Bioinformatics Approaches to Problems in Virology
Influenza Virus
Herpesvirus: From Phylogeny to Gene Expression
Human Immunodeficiency Virus
Bioinformatic Approaches to HIV-1
Measles Virus
26. Influenza viruses belong to the
family Orthomyxoviridae.
The viral particles are about 80-
120 nm in diameter and can be
spherical or pleiomorphic. They
have a lipid membrane envelope
that contains the two
glycoproteins: hemagglutinin (H)
and neuraminidase (N). These
two proteins determine the
subtypes of Influenza A virus.
Influenza virus
Influenza A
Influenza virus leads to 200,000 hospitalizations and
~36,000 deaths in the U.S. each year.
Page 574
27. Since 1976, the H5N1 avian influenza virus has infected at
least 232 people (mostly in Asia), of whom 134 have died.
A major concern is that a human influenza virus and the H5N1
avian influenza strain were to combine, a new lethal virus
could emerge causing a human pandemic. In a pandemic,
20% to 40% of the population is infected per year.
►The 1918 Spanish influenza virus killed tens of millions of
people (H1N1 subtype).
►1957 (H2N2)
► 1968 (H3N2)
► Asia 2003-2005 (H5N1)
► Current, 2009 (H1N1, “swine flu”)
Influenza virus
Page 575
28. There are three types: A, B, C
► A and B cause flu epidemics
► Influenza A: 20 subtypes; occurs in humans, other animals.
For example, in birds there are nine subtypes based on the
type of neuraminidase expressed (group 1: N1, N4, N5, N8;
group 2: N2, N3, N6, N7, N9). The structure of H5N1 avian
influenza neuraminidase has been reported (Russell RJ et al.,
Nature 443:45, 2006).
► Influenza A genome consists of eight, single negative-
strand RNAs (from 890 to 2340 nucleotides). Each RNA
segment encodes one to two proteins.
Influenza virus
Page 575
30. NCBI offers an Influenza Virus Resource
(http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html)
31. Growth of Influenza Virus Sequences in GenBank
10/08 http://www.ncbi.nlm.nih.gov/genomes/FLU/growth.html
32. Holmes et al. (2005) performed phylogenetic analyses of 156
complete genomes of human H3N2 influenza A viruses
collected over time (1999-2004) in one location (New York
State).
Phylogenetic analysis revealed multiple reassortment events.
One clade of H3N2 virus, present since 2002, is the source for
the HA gene in all subsequently sampled viruses.
Large-scale influenza virus genome analysis
Holmes EC, et al. Whole-genome analysis of human influenza A virus reveals
multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol.
2005 Sep;3(9):e300.
Page 576
33. Evolutionary Relationships of Concatenated Major Coding Regions
of Influenza A Viruses Sampled in New York State during 1999–
2004. The maximum likelihood phylogenetic tree is mid-point rooted
for purposes of clarity, and all horizontal branch lengths are drawn
to scale. Bootstrap values are shown for key nodes. Isolates assigned
to clade A (light blue), clade B (yellow), and clade C (red) are
indicated, as are those isolates involved in other reassortment
events: A/New York/11/2003 (orange), A/New York/182/2000 (dark
blue), and A/New York/137/1999 and A/New York/138/1999 (green).
Holmes EC, et al. Whole-genome
analysis of human influenza A virus
reveals multiple persistent lineages and
reassortment among recent H3N2
viruses. PLoS Biol. 2005 Sep;3(9):e300.
34. Holmes EC, et al. Whole-genome
analysis of human influenza A virus
reveals multiple persistent lineages and
reassortment among recent H3N2
viruses. PLoS Biol. 2005 Sep;3(9):e300.
35. Ghedin et al. (2005) sequenced 209 complete genomes of
human influenza A virus (sequencing 2,821,103 nucleotides).
See Nature 437:1162.
Large-scale influenza virus genome analysis
36. Each row represents a single amino acid position in
one protein. Amino acids (single-letter abbreviations
are used) are colour-coded as shown in the key, so that
mutations can be seen as changes in colour when
scanning from left to right along a row. For simplicity,
only amino acids that showed changes in at least three
isolates are shown. Each column represents a single
isolate, and columns are only a few pixels wide in
order to display all 207 H3N2 isolates in this figure.
Isolates are ordered along the columns chronologically
according to the date of collection; boundaries
between influenza seasons are indicated by gaps
between columns. A more detailed version of this
figure, showing positions that experienced any amino
acid change and showing identifiers for the isolates in
each column, is available as Supplementary Fig. 1.
Ghedin E, et al. Large-scale sequencing of human influenza reveals the dynamic
nature of viral genome evolution. Nature. 2005 Oct 20;437(7062):1162-6.
39. Outline of today’s lecture
Introduction
Classification of Viruses
Diversity and Evolution of Viruses
Metagenomics and Virus Diversity
Bioinformatics Approaches to Problems in Virology
Influenza Virus
Herpesvirus: From Phylogeny to Gene Expression
Human Immunodeficiency Virus
Bioinformatic Approaches to HIV-1
Measles Virus
40. Herpesviruses are double-stranded DNA viruses that
include herpes simplex, cytomegalovirus, and Epstein-Barr.
The genomic DNA is packed inside an icosahedral capsid;
with a lipid bilayer the diameter is ~200 nanometers.
Herpesvirus
Page 578
41. Phylogenetic analysis suggests three major groups
that originated about 180-220 MYA.
Mammalian herpesviruses are in all three subfamilies.
Avian and reptilian herpesviruses are all in the
Alphaherpesvirinae.
Page 578
Herpesvirus
43. McGeoch et al. (Virus Res. 117:90-104, 2006) describe
a new herpesvirus taxonomy.
Family Herpesviridae
Subfamilies Alpha-, Beta-, Gammaherpesvirinae
New family Alloherpesviridae (piscine, amphibian
herpesviruses)
Herpesvirus taxonomy
Page 578
45. Genome sizes range from 124 kb (simian varicella virus
from Alphaherpesvirinae) to 241 kb (chimpanzee
cytomegalovirus from Betaherpesvirinae).
► GC content ranges from 32% to 75%.
► Protein-coding regions occur at a density of one
gene per 1.5 to 2 kb of herpesvirus DNA.
► There are immediate-early genes, early genes
(nucleotide metabolism, DNA replication), and late
genes (encoding proteins comprising the virion).
► Introns occur in some herpesvirus genes.
► Noncoding RNAs have been described (e.g. latency-
associated transcripts in HSV-1).
Herpesvirus taxonomy
46. Consider human herpesvirus 8 (HHV-8)(family Herpesviridae;
subfamily Gammaherpesvirinae). Its genome is ~140,000
base pairs and encodes ~80 proteins. Its RefSeq accession
number is NC_003409.
We can explore this virus at the NCBI website.
Try NCBI Entrez Genomes viruses (this is on the
right sidebar) dsDNA
Bioinformatic approaches to herpesvirus
Page 579
48. Fig. 14.7
Page 579
NCBI virus site includes tools (e.g. “Protein clusters”)
to analyze herpesviruses
49. HHV-8 proteins include structural and metabolic proteins.
There are also viral homologs of human host proteins such
as the apoptosis inhibitor Bcl-2, an interleukin receptor,
and a neural cell adhesion-related adhesin.
Mechanisms by which viruses may acquire host proteins
include recombination, transposition, splicing. A blastp
search using HHV-8 interleukin IL-8 receptor as a query
reveals several other viral IL-8 receptor molecules.
Viruses can acquire host genes
Page 579
51. Functional genomics approaches have been applied to
human herpesvirus 8 (HHV-8). For example, microarrays
have been used to define changes in viral gene expression
at different stages of infection (Paulose-Murphy et al., 2001).
Conversely, gene expression changes have been measured
in human cells following viral infection.
Bioinformatic approaches to herpesvirus
Page 582
54. Outline of today’s lecture
Introduction
Classification of Viruses
Diversity and Evolution of Viruses
Metagenomics and Virus Diversity
Bioinformatics Approaches to Problems in Virology
Influenza Virus
Herpesvirus: From Phylogeny to Gene Expression
Human Immunodeficiency Virus
Bioinformatic Approaches to HIV-1
Measles Virus
55. Human Immunodeficiency Virus (HIV) is the cause of
AIDS. Some have estimated that 33 million people were
infected with HIV (2006).
HIV-1 and HIV-2 are primate lentiviruses. The HIV-1 genome
is 9181 bases in length. Note that there are >300,000 Entrez
nucleotide records for this genome (but only one RefSeq
entry).
Phylogenetic analyses suggest that HIV-2 appeared as
a cross-species contamination from a simian virus,
SIVsm (sooty mangebey). Similarly, HIV-1 appeared
from simian immunodeficiency virus of the chimpanzee
(SIVcpz).
Bioinformatic approaches to HIV
Page 583
56. Fig. 14.13
Page 584
HIV phylogeny based on pol suggests five clades
Hahn et al., 2000
1. Simian immunodeficiency virus
from the chimpanzee Pan troglodytes
(SIVcpz) with HIV-1
57. HIV phylogeny based on pol suggests five clades
Hahn et al., 2000
2. SIV from the sooty mangabeys
Cerecocebus atys (SIVsm), with HIV-2
and SIV from the macaque (genus
Macaca; SIVmac)
Fig. 14.13
Page 584
58. HIV phylogeny based on pol suggests five clades
Hahn et al., 2000
3. SIV from African green monkeys
(genus Chlorocebus)(SIVagm)
Fig. 14.13
Page 584
59. HIV phylogeny based on pol suggests five clades
Hahn et al., 2000
4. SIV from Sykes’ monkeys,
Cercopithecus albogularis (SIVsyk)
Fig. 14.13
Page 584
60. HIV phylogeny based on pol suggests five clades
Hahn et al., 2000
5. SIV from l’Hoest monkeys
(Cercopithecus lhoesti); from suntailed
monkeys (Cercopithecus solatus); and
from mandrill (Mandrillus sphinx)
61. NCBI offers a retrovirus resource with reference genomes
and protein sets, and several tools (alignment, genotyping).
Bioinformatic approaches to HIV: NCBI
Page 585
63. Example of genotyping tool from NCBI retrovirus resource
reference sequence with the highest score
64. Los Alamos National Laboratory (LANL) databases
provide a major HIV resource.
See http://hiv-web.lanl.gov/
LANL offers
-- an HIV BLAST server
-- Synonymous/non-synonymous analysis program
-- a multiple alignment program
-- a PCA-like tool
-- a geography tool
Bioinformatic approaches to HIV: LANL
Page 586