NPR publication 2004

Approaches to identify, clone, and express symbiont bioactive
metabolite genes
Mark Hildebrand, Laura E. Waggoner, Grace E. Lim, Katherine H. Sharp, Christian P. Ridley
and Margo G. Haygood*
Scripps Institution of Oceanography, Marine Biology Research Division;
Center for Marine Biotechnology and Biomedicine; and UCSD Cancer Center,
University of California, San Diego, La Jolla, California 92093, USA
Received (in Cambridge, UK) 22nd October 2003
First published as an Advance Article on the web 15th December 2003
Covering: 1981–2003
This review discusses approaches to identify, clone, and express bioactive metabolite genes from symbionts of
marine invertebrates. Criteria for proving symbiotic origin of bioactive metabolites are presented, followed by a
comprehensive, practically-oriented overview of techniques to be applied. The Bugula neritina/Endobugula sertula
association is used as a primary example, but other symbioses are discussed. Thirty-six compounds are presented and
111 references are cited.
1 Introduction
2 Criteria for proving symbiotic origin of bioactive
metabolites
3 Evaluation of natural products as bacterially-produced
compounds
4 Localization of bioactive metabolites in marine
invertebrates to specific cell types
5 Investigating microbe presence
6 Identifying and isolating biosynthetic genes
7 Cloning of biosynthetic genes
8 Strategies for sequence determination
9 Confirming that cloned genes encode the biosynthetic
machinery for a metabolite
10 Genome-based methods in symbiont bioactive
metabolite research
11 Summary
12 Acknowledgements
13 References
Mark Hildebrand received a PhD in Biochemistry, with an emphasis on molecular biology, from the University of Arizona in 1987. He did
post-doctoral research with Professor Benjamin Volcani at the Scripps Institution of Oceanography and is currently an Associate Project
Scientist at Scripps. His research interests include the molecular and cell biology of silicified cell wall synthesis in diatoms, biological
applications in nanotechnology, and cloning and expressing bioactive metabolite genes.
Laura E. Waggoner received a BS in Biology from Duke University in 1995. She completed her PhD in Biology in 1999 at the University of
California, San Diego, where she studied the molecular mechanisms governing regulation of egg-laying behavior in nematodes. Combining
her experience in molecular biology with a lifelong interest in marine biology, she then took a post-doctoral position at Scripps Institution of
Oceanography, where she is currently investigating marine invertebrate symbioses and the bioactive metabolites they produce.
Grace Lim received her BS in Molecular Environmental Biology with an emphasis on Microbiology at the University of California,
Berkeley in 1998. She is currently pursuing a PhD degree in Marine Biology with Margo Haygood at the Scripps Institution of
Oceanography. Grace’s interests include bacterial phylogenetics and genomics as applied to the study of symbiosis and secondary
metabolism.
Katherine Sharp received a BA in Biology and Anthropology from Mount Holyoke College in 1998. She is currently a PhD candidate in
Marine Biology with Dr Margo Haygood at the Scripps Institution of Oceanography. During her time at Scripps, she has worked within the
field of marine bioactive metabolite symbiosis and focused her research efforts on microbial ecology of sponges, as well as symbiont
transmission and recruitment mechanisms in marine invertebrate hosts.
Mark Hildebrand Laura E. Waggoner Grace E. Lim Katherine H. Sharp
DOI:10.1039/b302336m
122 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2 T h i s j o u r n a l i s © T h e R o y a l S o c i e t y o f C h e m i s t r y 2 0 0 4
Publishedon15December2003.Downloadedon09/11/201622:02:59. View Article Online / Journal Homepage / Table of Contents for this issue

Christian Ridley was born in 1977 in Kinnelon, NJ. He received a BS in Marine Chemistry from Southampton College (Long Island
University) in 1999. He is currently working on his PhD in marine natural products research at the Scripps Institution of Oceanography,
studying symbioses between marine invertebrates and bacteria. In addition to symbiosis, his research interests include the isolation
and structural elucidation of natural products as well as the synthesis of natural product analogs to explore structure–activity
relationships.
Christian P. Ridley
Margo Haygood is a Professor of Marine Biology at the
Scripps Institution of Oceanography, University of California,
San Diego. She studied History of Science at Harvard
University, and received her PhD in Marine Biology from
Scripps Institution of Oceanography in 1984. She did
postdoctoral work in molecular biology with Professor Mary
Lidstrom at the University of Washington, and served as a
scientific officer for microbiology and molecular biology
programs at the US Office of Naval Research. She returned to
Scripps as an assistant professor in 1987. Her interests in
marine microbiology include iron acquisition and microbial
symbioses, especially bioactive metabolite symbioses.
Margo Haygood
1 Introduction
1.1 Marine invertebrate natural products
Marine invertebrates have been and continue to be a prolific
source of novel and structurally diverse natural products.1,2
Often these compounds display potent and selective bio-
activities that trigger biomedical interest.3
Unfortunately, the
supply of the bioactive natural product is usually insufficient to
meet the demands of pre-clinical and clinical development. A
large-scale collection of the source marine invertebrate can be
difficult due to scarcity of the organism, and can also have
negative environmental consequences. In addition, natural sup-
plies can fluctuate, either seasonally or due to environmental
changes. Aquaculture4
or cell culture of an invertebrate could
alleviate supply problems, but approaches for these are not yet
developed for all organisms. Ideally, an efficient chemical syn-
thesis of the desired natural product could be achieved, how-
ever the structural complexity of many natural products such as
bryostatin 1 (1) and swinholide A (2) requires inefficient multi-
step syntheses that cannot meet the demands of pre-clinical and
clinical development.5
Identification of a simpler, more easily
synthesized structure which retains the biological activity6
is
another option, but the best scenario would be to have a supply
of the natural product that can be generated inexpensively and
reproducibly in the lab under controlled conditions.
1.2 Bioactive metabolite symbioses
In most cases, it is likely that marine invertebrates produce
their natural products themselves.7
However, on occasion
analysis of the structure or localization of a natural product
suggests that the molecule is biosynthesized by an associated
microbial symbiont. Distinguishing between these possibilities
was an area of intensive study in John Faulkner’s laboratory.
One of Faulkner’s most important contributions was to
recognize that this topic deserved attention beyond casual
speculation, and that rigorous experimental tests were possible
and should be pursued.
Symbiotic systems in which there is a strong likelihood
of microbial bioactive metabolite synthesis offer attractive
alternatives to chemical synthesis or extraction from natural
sources. Symbionts that can be cultivated in the laboratory
and still produce the bioactive metabolite could be subjected to
fermentation technology to produce large amounts of the com-
pound. However, cultivation of tightly integrated microbial
symbionts can be difficult because of their dependency on the
host,8
and success rates are thus low. In these cases, alternative
means of obtaining the compound need to be explored.
Unlike their invertebrate hosts, genomes of bacteria and
archaea are small and their biosynthetic pathways tend to be
organized in contiguous regions of DNA (operons). These
features greatly facilitate cloning of these pathways. Expression
technology for bacterial genes is well developed, making
123N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2
Publishedon15December2003.Downloadedon09/11/201622:02:59. View Article Online

cloning and expressing biosynthetic genes of bacterial sym-
bionts entirely feasible. In the case of uncultivable symbionts,
this provides the only way to produce bioactive metabolites in a
culture system. For both cultivable and non-cultivable sym-
bionts, cloning and expressing bioactive metabolite genes offer
the possibility of providing sufficient amounts of compounds
for drug development that could not otherwise be obtained,
and open an avenue for combinatorial biosynthesis later on.
In this review, we will examine the process of determining
whether or not symbionts are in fact likely to be producing a
natural product, and outline approaches to identify, clone,
characterize, and express bioactive metabolite genes from sym-
bionts that do.
2 Criteria for proving symbiotic origin of bioactive metabolites
The study of symbiosis is evolving and specific criteria to prove
hypotheses of the symbiotic origin of bioactive metabolites are
emerging from the consensus of scientists in the field. In con-
sidering possible criteria, it is profitable to examine those used
for a related subject, infectious disease. In classical micro-
biology, Koch’s postulates, described in 1884,9
are the gold
standard for determining the causative agent of a disease. They
require 1) that the candidate organism always be present in the
disease state, and absent in healthy organisms, 2) that it can be
isolated in pure culture from diseased tissue, 3) that reintroduc-
tion of the agent precipitates the disease in healthy subjects, and
4) that the candidate can be isolated again from the diseased
host. Satisfaction of all these principles constitutes a rigorous
proof, but there are many cases, usually those in which the dis-
ease agent cannot be grown in pure culture, when all of the
principles cannot be satisfied. In such situations, modified
criteria must be employed to provide supporting evidence for
the microbe’s involvement in infection.
The equivalent to Koch’s postulates in bioactive metabolite
symbiosis is to 1) correlate presence of the symbiont with a
function for the host, 2) remove the symbiont and show loss of
function, 3) reintroduce the symbiont and show that function is
regained, and 4) isolate the symbiont again. This also is a rigor-
ous approach, but in many symbioses all of these criteria can-
not readily be fulfilled. One difficulty lies in obtaining aposym-
biotic (symbiont-free) hosts, which are sometimes not viable
without their symbiont.10
Also, reintroducing obligate sym-
bionts, which do not maintain populations outside of the host,
is far more difficult than reintroducing infectious organisms
that have specifically evolved to invade their hosts. Obligate
symbionts are often transferred only directly between gener-
ations and lack reinfection capability. As with infectious dis-
eases, modified criteria must be employed to substantiate the
role of a microbial symbiont in bioactive metabolite synthesis.
An alternative to the microbiological approach described
above is to use molecular tools to demonstrate that the bio-
synthetic machinery for metabolite synthesis resides in the
symbiont. Techniques to do so can be applied to purified or
partially purified symbionts, or in situ, where a diagnostic signal
is localized to the symbiont. The use of nucleotide probes for
biosynthetic genes can confirm that these genes reside in the
symbiont genome. However, this approach requires an authen-
tic probe, derived from genes that have been independently
verified to have the required function. Cloning biosynthetic
genes from a symbiont and establishing their function can be a
major undertaking in itself; hence initial experiments to gener-
ate small probes specific for the genes of interest can be useful.
Likewise, specific antibodies could be used to detect the pres-
ence of biosynthetic enzymes in a symbiont. Unless one has an
antibody that recognizes the same class of enzyme from differ-
ent species, as in the detection of Rubisco in chemoautotrophic
symbionts,11
this requires purifying or expressing the enzyme
from the symbiotic association and verifying its function,
before it can be used to produce specific antibodies for precise
localization. Enzymatic function can also be directly assayed,
or visualized in situ, using specific substrates that produce a
colored, fluorescent or radioactive signal.
2.1 Criteria for study
All of the above approaches require a substantial amount of
effort, and researchers need ways to prioritize experimental
approaches on the organisms most suitable for such an effort. A
variety of types of circumstantial evidence, taken together, can
strongly implicate a microorganism as the biosynthetic source
of a metabolite, and provide impetus for further investment of
effort to obtain definitive proof. Examples of evaluation criteria
are:
I. Similarity to known microbial compounds. A structural
similarity between metabolites from marine animals and
those from microbial sources is often the starting point for
investigation into bioactive metabolite symbioses. If related
compounds have not been found in multicellular animals,
it is more likely that the microbial symbiont is the source
of the metabolite. These similarities must be interpreted with
care; an important caveat is that the chemistry of the living
world is by no means fully surveyed, and compound classes
that we now regard as microbial may have additional sources.
Another consideration is that even if the compounds were
originally of microbial origin, lateral transfer of genes to the
animal can occur, albeit rarely, conferring the ability to bio-
synthesize non-characteristic molecules in the absence of the
microbe.12
II. Physical location of the compound. In some cases metab-
olites can be localized to either the symbiont or the host cells.
Although it is logical to assume that the location of a com-
pound reflects its site of origin, a compound may diffuse, or be
exported elsewhere, after synthesis. Free-living microbes often
transport antibiotics that they synthesize out of their cells
quite efficiently, in part to protect themselves.13
Thus, physical
location may reflect function as much as origin. It should be
noted that if there are no microbes present, there is still a possi-
bility that the marine invertebrate is not the source of the
natural product. The metabolite could be dietary-derived, as
is currently believed to be the case for Halichondria okadai
and Halichondria melanodocia, which contain okadaic acid
produced by dinoflagellates of the genus Prorocentrum.7
III. Presence of a persistently associated microbe and corre-
lation with bioactive compounds. Since most organisms have
many microbes associated with them, it is important to
distinguish casual associates from persistently associated
ones. Persistently associated microbes can be candidates to be
producers of a bioactive compound that is characteristic of
the animal; thus, experiments to identify those microbes can be
important steps in elucidating the source of the compound.
When there are variations in natural products within a host
species or related group of species, correlation of the presence
of a particular microbe with a particular compound provides
support for the microbial involvement in synthesis of that
compound. If animals aposymbiotic for a particular microbe
also lack the metabolite, this suggests that the microbe might
be a good candidate for further study. These criteria do not
prove that the microbe is the source of the compound, but
they eliminate sporadically associated microbes from further
consideration.
Reproductive tissues in gametes and larvae are always
important to examine for the presence of microbes. Although
symbionts can be recruited from the environment, in many
cases, the host has evolved mechanisms to ensure intergener-
ational (“vertical”) transmission. Microbes persistently associ-
ated with eggs and larvae are likely to have important roles in
124 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

the life of the host, one of which could be synthesis of bioactive
metabolites.
IV. Experimental manipulation of symbiont load. In some
cases, symbionts can be reduced or eliminated by treatment
with antibiotics or by other methods.14–16
Correlation of sym-
biont reduction or elimination with reduction or elimination of
the metabolite can be useful information, as long as the follow-
ing caveat is kept in mind: results will be dependent on the rate
of turnover of the metabolite. In one example,17
a natural prod-
uct hypothesized to be produced by a symbiont was shown to
crystallize in the tissue of a marine sponge, suggesting that
some bacterially-produced compounds may have a lengthy life
span once produced and may be present even if the bacteria are
removed or die. If a compound is persistent in the animal,
changes in concentration due to reduction in synthesis can be
difficult to detect, unless pulse labeling can be used. Pulse label-
ing entails brief incubation of a radioactive precursor for the
compound to monitor its rate of synthesis and degradation. If
the metabolite is produced by the host, elimination of a sym-
biont can have an indirect effect on metabolite production if the
host is dependent upon the microbe for other reasons.
V. Presence of the correct class of biosynthetic genes in the
symbiont. Evaluating whether a microbial symbiont is the
source of a bioactive metabolite by the above criteria can be
difficult since there can be multiple interpretations of the data.
A more definitive way is to demonstrate that biosynthetic genes
responsible for the metabolite are located in the symbiont.
Cloning specific biosynthetic genes and verifying their function
are time consuming; however, preliminary screening based on
information about the class of enzyme most likely responsible
for forming the chemical structure can be advantageous. For
example, peptides that incorporate unusual amino acids are
likely to be assembled by non-ribosomal peptide synthetases
(NRPS), a well-characterized enzyme family.18
Polyketide syn-
thases (PKS) also contain conserved domains that can be used
to generate probes.19
These enzymes may have distinctive signa-
tures, such as conserved amino acid residues, that can be used
for detection by a variety of molecular techniques. The presence
of an enzyme or gene of the right type is very encouraging.
Both the criteria for evaluating the role of symbionts in bio-
active metabolite production, and the techniques for investigat-
ing these symbioses are emerging. Applying these criteria can
build substantial support for the involvement of a particular
microbe in the synthesis of a bioactive metabolite. The final
proof lies in either actually growing the symbiont in culture and
subsequently isolating the natural product from the culture, or
for non-cultivable symbionts, cloning and expressing target
genes in a heterologous, cultivable organism. In this article we
will use the research in our laboratory and our collaborations
with John Faulkner’s group to illustrate issues and describe
methods important in investigating marine invertebrate/
microbial symbioses and identifying the producer of a bioactive
metabolite. Some of the concepts in this review were presented
previously,7
however, our goal is to provide a comprehensive,
practically oriented overview. We will focus on progress to date
in the Bugula neritina–Endobugula sertula association, which
has become our model system for developing approaches and
methods. In addition, we will discuss examples of research on
other invertebrate–microbe symbioses that demonstrate specific
techniques and challenges in bioactive metabolite symbiosis
research.
3 Evaluation of natural products as bacterially-produced
compounds
Three criteria are often used to evaluate whether the structure
of a natural product indicates a bacterial origin in a symbiotic
association. First is that these compounds share structural
similarities with those isolated from cultured microbes. For
example, halichondramide (3) from the sponge Halichondria
sp. bears a resemblance to scytophycin B (4) from the
cyanobacterium Scytonema pseudohofmanni, and therefore is
speculated to be microbial in origin.20
Another example is
ecteinascidin, ET-743 (5) and related compounds found in
the ascidian Ecteinascidia turbinata. They are thought to be
biosynthesized by symbiotic bacteria since they share struc-
tural similarities with saframycin B (6), which is produced by
Streptomyces lavendulae,20
and the safracins from Pseudo-
monas.21
(It is interesting to note that the production of ET-743
by PharmaMar is by semi-synthesis from safracin-B).
Another criterion used to evaluate whether a compound is
microbially-produced is the presence of similar compounds in
unrelated host organisms. In this case, it is considered more
likely that microbes with a common biosynthetic capacity are
found in the different hosts, rather than for the hosts to have
undergone convergent evolution to be able to synthesize the
same compound. The ecteinascidins also fit this criterion, as
they are not only similar to an actinomycete metabolite, but
also resemble renieramycin E (7) and its analogs, which were
isolated from sponges of the genus Reniera.20
Similarly, mycal-
amide A (8) from the sponge Mycale sp. shares a striking
resemblance to pederin (9), a metabolite isolated from the
blister beetle Paederus sp.20
A final criterion is that even if the compounds do not share
structural similarity with known metabolites from cultured
microbes, or from unrelated organisms, a symbiotic origin is
hypothesized if the metabolites appear to be synthesized by
known microbial enzymes. For example, while bryostatin
(1), isolated from the bryozoan Bugula neritina, does not
superficially resemble any microbial product, it is a complex
polyketide. Complex polyketides (non-aromatic macrolides) are
typically produced by bacteria and fungi,22
and hence, it was
suggested that bryostatin is produced by a microbial symbiont
of the bryozoan.23
Cyclic peptides, and peptides with non-
proteinogenic amino acids, are synthesized by NRPSs, which
are enzymes typically found in microbes.18
It is important to note that these three criteria are more of a
suggestive rather than a substantive way of targeting a sym-
biont as the source of a bioactive metabolite, because of the
possibility that different hosts have evolved similar biosynthetic
capacities. However, these criteria can be valuable in devising
experiments to directly test such hypotheses. An example is the
hypothesis that the B. neritina symbiont “Candidatus Endo-
bugula sertula” is the synthetic source of bryostatin. By devel-
oping a probe to a modular PKS based on sequences from
other microbes, Davidson et al.14
were able to demonstrate
expression of a PKS in E. sertula, and this probe has enabled
the cloning of the putative bryostatin PKS (unpublished data).
4 Localization of bioactive metabolites in marine invertebrates
to specific cell types
A number of studies have been carried out on marine
invertebrates where a natural product has been localized to a
specific cell type. Although this information must be evaluated
with the caveats described in section 2.1, namely that the site of
synthesis may not be where the metabolite is ultimately local-
ized, or the metabolite could be dietary-derived, cell type local-
ization can still be a useful piece of information in determining
if the compound is microbial in origin. A schematic diagram
depicting approaches to localizing bioactive metabolites is
shown in Fig. 1.
Because marine invertebrates frequently contain large and
diverse bacterial populations, exemplified by Aplysina aero-
phobia, Rhopaloides odorabile, and Theonella swinhoei,24
it is
quite common to find potential natural product-producing
bacteria in these organisms. However, in spite of the abundance
of bacteria, most localization studies have implicated the host
125N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

sponge (Table 1), or ascidian (Table 2) as the biosynthetic
source of their bioactive metabolites. One important consider-
ation of these data is that some host cell types may contain
bacteria, either intracellularly or tightly associated with the
exterior of the cell, and although this occurs frequently, in some
studies it has been overlooked. When cell separation studies are
done, it is important to rigorously analyze the bacterial content
of the “host cell” fraction to evaluate whether bacteria are
present.
These localization studies (Tables 1 and 2) have revealed a
few surprises. The pyridoacridines had been proposed to
originate in a symbiont since they were isolated from unrelated
126 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

Fig. 1 Approaches for localizing bioactive metabolites.
organisms such as tunicates, sponges, an anemone (Cnidaria),
and a prosobranch mollusc.25
However, Salomon and Faulkner
utilized the pH-dependent fluorescent properties of dercitamide
(10) to localize the metabolite to “inclusional” sponge cells in
Oceanapia sagittaria.26
Further examination by transmission
electron microscopy (TEM) revealed that no intracellular sym-
bionts were present in these cells, providing further support that
these metabolites were synthesized de novo by the sponge. A
similar study conducted on the tunic of the ascidian Cystodytes
dellechiajei using the pH-dependent properties of kuanoni-
amine D (11) and shermilamine B (12), indicated that the
pyridoacridines were contained in ascidian bladder cells and
pigment cells.27
The tambjamines have been isolated from bryo-
zoans, ascidians, and a mutant strain of the bacterium Serretia
marcescens, and therefore were also thought to be produced by
associated bacteria in the ascidian Atapozoa sp.28
A study of the
tissue by microscopy led to the proposal that tambjamine C, E
and F (13–15) are found in granular amebocyte blood cells
based on the fact that these compounds have a bright yellow
coloration and the lack of intense pigmentation in other cells.28
Although this did not rule out the possibility that another pig-
ment was responsible for the coloration of the granular amebo-
cytes, the authors also indicated that there was no significant
amount of intra- or extracellular bacteria in the ascidian, which
provided further support that these compounds were biosyn-
thesized by the Atapozoa sp. These methods do not exclude the
possibility that the compound-containing cells are storage sites
for natural products that are produced elsewhere. However,
there is no known case of an extracellular bacterium in a
marine invertebrate producing a natural product and transfer-
ring it to specific host cells. Instead, metabolite production in
these organisms is likely due to convergent evolution to produce
natural products that possess useful biological activities, or
possibly due to gene transfer events.
Other studies have successfully identified a microbial sym-
biont responsible for the production of certain secondary
metabolites (Table 3). Taking advantage of the auto-
fluorescence of cyanobacteria, the sponge cells of Dysidea
herbacea were separated from associated Oscillatoria spongeliae
filaments using a fluorescence activated cell sorter, and the
chlorinated amino acid derivative 13-demethylisodysidenin (16)
was shown to exist only in the filamentous cyanobacterial cells,
while the sesquiterpenes spirodysin (17) and herbadysidolide
(18) were found only in the sponge cells.29
Using the same
technique on a different specimen of D. herbacea, Unson et al.
demonstrated that a brominated diphenyl ether (19) was located
only in the cyanobacterial filaments.17
Host cells and cyano-
bacterial cells from a sample of D. herbacea that contained the
chlorinated diketopiperazines dihydrodysamide C (20) and
didechlorodihydrodysamide C (21) were separated on a centri-
fugation density gradient, and the chlorinated metabolites were
127N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

Table 1 Natural products localized in marine sponge cells
Species Natural product(s) Compound class Ref
Amphimedon terpenensis diisocyanoadociane
6 sterolsa
diterpene
sterols
100
Amphimedon terpenensis 3 brominated fatty acids fatty acids 101
Aplysina fistularis aerothionin (34)
homoaerothionin (35)
brominated tyrosine dev. 37
Crambe crambe crambines and/or crambescidins guanidine alkaloids 102
Dysidea avara avarol sesquiterpene hydroquinone 103, 104
Dysidea herbacea spirodysin (17), herbadysidolide (18) sesquiterpenes 29
Haliclona sp. haliclonacyclamines A and B pyridine alkaloids 105
Negombata magnifica latrunculin B (36) macrolide 38
Oceanapia sagittaria dercitamide (10) pyridoacridine alkaloid 26
a
Other sterols and non-brominated long chain fatty acids are found in sponge cells.106–108
Table 2 Natural products localized in marine ascidian cells
Species Natural product(s) Compound class Ref
Atapozoa sp. tambjamines C, E, F (13–15) bipyrrole alkaloids 28
Cystodytes dellechiajei kuanoniamine D (11), shermilamine B (12) pyridoacridine alkaloids 27
Lissoclinum bistratum bistramide A (23)a, b
(= bistratene A) macrocyclic ether 34
Lissoclinum patella patellamides A–C (31–33)b
cyclic peptides 36
Styela plicata plicatamidec
octapeptide 109
a
The relative stereochemistry of the tetrahydropyranyl and spiroketal moieties has been proposed.110 b
Study results conflict with other studies shown
in Table 3. c
An example of a number of peptides, including tunichromes and larger polypeptides,111
isolated from the blood cells of ascidians.
Table 3 Natural products localized in symbiotic bacteria
Host species Natural product(s) Compound class Bacterium Ref
Dysidea herbacea 13-demethylisodysidenin (16) chlorinated amino acid dev. Oscillatoria spongeliae 29
Dysidea herbacea brominated diphenyl ether (19) brominated diphenyl ether Oscillatoria spongeliae 17
Dysidea herbacea dihydrodysamide C (20)
didechlorodihydrodysamide C (21)
chlorinated diketopiperazines Oscillatoria spongeliae 30
Lissoclinum bistratum bistratamide A (29) and B (30)a
cyclic peptides Prochloron sp. 34
Lissoclinum bistratum bistramide A (23)a
macrocyclic ether Prochloron sp. 35
Lissoclinum patella lissoclinamide 4 (24) and 5 (25),
ulithiacyclamide (26), patellamide D (27),
ascidiacyclamide (28)a
cyclic peptides Prochloron sp. 33
Theonella swinhoei swinholide A (2) macrolide unicellular heterotrophic 31
Theonella swinhoei theopalauamide (22) bicyclic glycopeptide “Candidatus Entotheonella
palauensis”
32
a
Study results conflict with other studies shown in Table 2.
shown to exist only in the cyanobacterial fraction.30
Interest-
ingly, one O. spongeliae fraction did not contain the chlorinated
amino acid derivatives, leaving open the possibility that there
may be closely related strains of cyanobacteria in the sponge.
A study of the sponge Theonella swinhoei indicated that
swinholide A (2) and theopalauamide (22) were localized to
unicellular heterotrophic bacteria and a filamentous hetero-
trophic bacterium, respectively.31
This was accomplished
through the use of differential centrifugation, a technique in
which dissociated cells are exposed to increasing speeds of
centrifugation to yield different fractions of cells. The fila-
mentous heterotrophic bacterium was later identified as a
δ-proteobacterium, “Candidatus Entotheonella palauensis”.32
Cellular localization studies do not always definitively
identify the source organism. A good example is several
studies conducted on the ascidians Lissoclinum bistratum and
Lissoclinum patella, where it was attempted to determine
whether the cyclic peptides and the macrocyclic ether bis-
tramide A (= bistratene A) (23) were located in ascidian cells
or in associated cyanobacterial cells of the genus Prochloron.
Initial studies based on separated cyanobacterial cells from
L. patella indicated that lissoclinamides 4 (24) and 5 (25),
ulithiacyclamide (26), patellamide D (27) and ascidiacyclamide
(28) were produced by the symbiont, as they could be isolated
from the Prochloron cells in equal or greater amounts on a
weight-to-weight basis than could be found in the entire
colony.33
From Lissoclinum bistratum, using the same technique,
Degnan et al.34
reported that the peptides bistratamide A (29)
and B (30) were found in the cyanobacteria, while bistramide
A (23) was not. A second study of L. bistratum contradicted
these results, concluding that bistramide A (23) was found in
128 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

Fig. 2 Approaches for investigating microbe presence.
Prochloron cells at concentrations 4 to 6 times greater than in
the intact ascidian.35
A recent study of L. patella has indicated
that the cyclic peptides patellamides A–C (31–33) are not found
in separated Prochloron cells, but are distributed throughout the
tunic.36
Based on these experiments, the source of the cyclic
peptides and bistramide A (23) is unclear and awaits further
studies.
Other techniques to localize natural products to specific cell
types are available. If the natural product is halogenated, as in
the case of aerothionin (34) and homoaerothionin (35) isolated
from the sponge Aplysina fistularis, energy dispersive X-ray
microanalysis can be used to determine the cellular location of
the metabolite in sections of tissue.37
In situations where the
natural product is not halogenated and cellular dissociation is
not easy, immunolocalization of the compound may be pos-
sible. This technique requires the production and isolation of
antibodies that specifically bind to the natural product, which
can be used as a probe to determine the cellular location of the
compound in a tissue section. This was accomplished in the
localization of latrunculin B (36) in the sponge Negombata
magnifica.38
5 Investigating microbe presence
An important criterion for demonstrating microbial origin of a
natural product is to confirm the presence of a persistently
associated microbe with the animal host. Following con-
ventional environmental microbiology research, one can
employ techniques from both molecular biology and micro-
scopy, as diagrammed in Fig. 2. Polymerase chain reaction
(PCR)-based techniques can be used to identify bacterial small
subunit (16S) ribosomal RNA (rRNA) gene sequences, reveal-
ing phylogenetic affiliations of microbes in or on animal tissues.
Probes targeting these sequences must be used to confirm the
presence of and localize specific microbes within a sample.
When it is feasible to use probes for biosynthetic genes, probing
for in situ expression of candidate biosynthetic genes can iden-
tify producers of bioactive compounds. Here we summarize
approaches using probing methods for the investigation of
symbiotic production of bioactive metabolites.
5.1 Identifying the host
Prior to identifying microbial symbionts, it is essential to
characterize each sample of the host unambiguously. Cryptic
speciation, in which two or more specimens appear identical
according to conventional taxonomy but are identified via
molecular data as distinct species, is often found in the marine
environment.39
For example, application of molecular tech-
niques has shown that B. neritina is a species complex with
at least three siblings,40,41
each with unique symbiotic and
chemical profiles. This illustrates the point that it is imperative
129N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

not to rely exclusively on conventional taxonomy but to couple
this with molecular analyses for definitive identification. Thus,
each sample of the host should include material suitable for
DNA extraction. The mitochondrial cytochrome oxidase I
(COI) gene (Fig. 2) can be useful for host identification, since
this gene evolves at a relatively fast rate, allowing differentiation
of closely related organisms.42
5.2 Molecular approaches: the value of ribosomal RNA
(rRNA) sequences
Since most obligate symbionts cannot be cultivated, we rely
heavily on molecular approaches to investigate microbial pres-
ence. The 16S rRNA gene sequence is widely accepted as a way
to identify environmental bacteria. This gene is present in all
microbes, is distinct from the small subunit rRNA in eukaryote
hosts, and possesses conserved regions to which oligonucleotide
PCR primers can be designed to amplify the gene from all
microbes – so called “universal” primers. Variable regions
within the 16S rRNA sequence can also distinguish closely
related microbial species, enabling the design of species-specific
or group-specific primers. The large size of 16S rRNA gene
databases, such as the Ribosomal Database Project II,43
facili-
tates the identification of a sequence of interest. Even if the
organism cannot be identified to the species level (due to
absence in the database), it can be placed within a group of
related organisms. The specific 16S rRNA sequence is a
signature of the organism and can be used to track its presence.
The 16S rRNA gene is typically amplified by PCR from a
total DNA preparation of the invertebrate and its associated
microbes (Fig. 2). Universal primers are used so that 16S rRNA
genes from all associated bacteria are amplified. Plasmid clone
libraries are constructed from the mixed pool of PCR products,
and clones are sequenced to determine what microbes are
associated with the invertebrate (Fig. 2). One concern with
PCR-based methods is a phenomenon known as PCR bias in
which universal primers may actually favor certain sequences
over others.44
PCR can also 1) produce chimeras in which
portions of a sequence are derived from different species, 2)
produce sequence errors due to misincorporation by the DNA
polymerase, and 3) form heteroduplexes consisting of
imperfectly matching strands of DNA hybridized to each
other. However, these artifacts can be minimized by using high
fidelity enzymes, adjusting PCR conditions, and post-PCR
purification.45
As with all environmental sampling methods, determining
the sampling number (evaluating when enough clones have
been sequenced to provide a representative picture of the
bacterial community) is an important consideration. Because
of nonspecific association of environmental bacteria in an
invertebrate, a bacterial species that is most abundant in a
sample is not necessarily significant to the host. In addition, the
abundance of a sequence in a clone library does not necessarily
reflect its abundance in a natural sample due to possible PCR
artifacts. Most analyses to date on bacterial biodiversity in
sponges have been based on sequencing 50–70 clones of 16S
rRNA.8,24
Depending on the number of microbes present and
their relative abundance, this may lead to an underestimation
of the total diversity present in an organism. Statistical
approaches for estimating microbial biodiversity and determin-
ing the number of sequences required for accurate represen-
tation of the natural sample have been the subject of several
reviews.46–48
Although a number of tools are available, none
appear superior, and different methods on the same sample
130 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

can yield biodiversity estimates that differ by an order of
magnitude or greater.46,49,50
Despite these problems, it seems
likely that improved statistical analyses will become routinely
incorporated into studies of microbial diversity of marine
invertebrates.
Sequencing clone libraries from several invertebrate samples
can be tedious, but other molecular approaches allow surveys
of microbial diversity in invertebrates (Fig. 2). Denaturing
gradient gel electrophoresis (DGGE) separates DNA according
to the temperature required to separate the two DNA strands
(the melting temperature), which differs depending on the
nucleotide composition of the DNA.51,52
Therefore, a mixed
pool of 16S rRNA gene fragments from different organisms
generated by PCR can be separated by DGGE (Fig. 3). Ideally,
identical sequences migrate to the same position in the gel, so
the use of DGGE to profile PCR products from multiple sam-
ples can reveal bacterial sequences that are common among
different samples (Fig. 3). One problem with DGGE is that
heteroduplexes are formed when amplifying from a mixed
population of DNA, which will migrate as separate bands but
need to be excluded from the analysis. A way to minimize hetero-
duplex formation is through the use of reconditioning PCR,53
in which a final PCR product is diluted and reamplified with
excess primers for a few cycles. Another potential issue is that
different sequences may have similar melting temperatures, and
can co-migrate. Running additional gels with less steep temper-
ature gradients can provide better resolving power, although
only sequencing of bands will confirm that they represent only
one species.
Another method for comparing microbial communities
among different communities of the host invertebrate is ter-
minal restriction fragment length polymorphism (T-RFLP).
This technique involves amplifying community DNA with
fluorescently labeled universal 16S rRNA primers and then
generating DNA fragments of different lengths depending on
their sequence by restriction enzyme digestion.54
These frag-
ments are separated electrophoretically, and their sizes are
diagnostic of the individual microbe 16S rRNA gene sequences
present. This is a rapid method to profile similarities and
differences among many samples; however, because sequencing
is not involved, it does not permit direct identification of the
microbes.
Fig. 3 Denaturing gradient gel electrophoresis of 16S rRNA from
Bugula neritina. Samples are PCR amplifications of 1) a cloned 16S
rRNA from E. sertula, 2) DNA isolated from adult B. neritina, 3) DNA
isolated from a bacterially enriched fraction of adult B. neritina, and 4)
DNA from B. neritina larvae. Arrow denotes the 16S rRNA band from
E. sertula, other bands (lanes 2 and 3) are from other bacteria
associated with the host.
5.3 Probes for bioactive metabolite genes
An alternative and complementary approach to 16S rRNA-
based probes is biosynthetic gene probes based on character-
istics of the secondary metabolite of interest (Fig. 2). For
example, complex polyketides such as bryostatin 1 are syn-
thesized by modular PKSs, enzymes that have distinct func-
tional domains within their larger protein sequence. The amino
acid sequence of certain domains is relatively well-conserved
across species, as is the case of the type I bacterial PKS
β-ketoacyl synthase (KS) domain. By comparing amino acid
sequences of this domain in several bacteria, Davidson et al.
(2001)14
identified conserved amino acids that were used to
design degenerate oligonucleotide primers complementary to
the gene sequence encoding those amino acids. Degenerate
primers compensate for the redundancy of the genetic code,
and will amplify from all sequences that encode the chosen
amino acid sequence, enabling the isolation of genes even when
the exact DNA sequence is unknown. Using these degenerate
primers under specific PCR conditions, a fragment of a KS
gene sequence was obtained from a B. neritina DNA extract.
These primers and the KS gene fragment were invaluable in
other characterizations of the B. neritina/E. sertula symbiosis.14
Isolating even a short (ca. 250 base pair) DNA fragment
specific to the symbiont of interest is an extremely valuable tool
in the characterization and eventual cloning of a bioactive
metabolite pathway. Another method, which does not rely on
oligonucleotide primers, is the isolation of a symbiont-specific
DNA fragment using a DNA fragment probe derived from the
same type of gene in another organism. This approach, called
“heterologous hybridization”, depends on gene sequences from
the two organisms being similar enough. Hybridization refers
to the complementary base pairing of two DNA sequences; if
one is labeled the other can be identified. For successful hetero-
logous hybridization the two genes usually must be from closely
related species. There are other potential complicating issues in
this approach; however, heterologous hybridization can be con-
sidered as another approach for isolating gene fragments and
entire genes from natural product biosynthetic pathways.
5.4 Testing persistent association of microbes with their hosts
by PCR or DGGE
Once candidate 16S rRNA or biosynthetic gene fragment
sequences are obtained, they can be used to demonstrate the
association of a possible symbiont with its host (Fig. 2). PCR
or DGGE techniques are especially useful for this purpose. A
PCR survey of B. neritina isolated from a variety of locations,
and other bryozoans, showed consistent presence of the KS
gene fragment described above along with the presence of
bryostatin in B. neritina, providing evidence that this KS gene
was involved in bryostatin synthesis.14
DGGE can be used to
compare the microbial communities of multiple samples or dif-
ferent life cycle stages of the same species (Fig. 3). This enables
discrimination between microbes that are only transiently or
sporadically associated with the invertebrate and those that are
true symbionts.
It is important to consider the life cycle stage of the host
when attempting to identify candidate symbiotic microbes. As
mentioned previously, direct transmission of a microbe from
generation to generation is indicative of an important func-
tional interaction; therefore, analyzing gametes or reproductive
tissues (e.g. developing embryos or larvae) can be valuable. In
addition, levels of non-persistent microbes may be reduced at
particular stages in the life cycle. For example, non-feeding
B. neritina larvae do not contain microbes from a gut, in
contrast to adult B. neritina. DGGE analysis of adult and
larval B. neritina DNA extracts indicates a significant enrich-
ment of E. sertula in the larvae relative to adult tissue (Fig. 3).
Once a microbial species is shown to be persistently
associated with an animal, experimental manipulation of the
131N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

bacterial population in the host can help to determine if there is
a microbial role in the bioactive metabolite biosynthesis. Anti-
biotics interacting with bacterial but not eukaryotic ribosomes
can be applied in an attempt to reduce the numbers of bacteria.
This was done using the antibiotic gentamycin sulfate on devel-
oping colonies of B. neritina.14
After subsequent growth, PCR
screening with E. sertula-specific primers indicated that levels
of E. sertula were reduced, and subsequent analysis showed
that bryostatin levels were as well.14
There was not a strict corre-
lation between the amount of reduction in the symbiont and
bryostatin; possible reasons for this were discussed in section
2.1, however, the result is consistent with an E. sertula involve-
ment in bryostatin synthesis.
5.5 Investigating microbe presence by microscopy
Conventional light microscopy, scanning electron microscopy,
and transmission electron microscopy have historically been
used for observing bacteria in environmental samples and
animal tissues. In addition, development of fluorescent stains
for application in epifluorescence microscopy has increased
capabilities for observing microbes in complex environmental
samples. For example, the fluorescent dye 4Ј-6-diamidino-
2-phenylindole (DAPI), which binds to DNA,55
allows the
researcher to distinguish between cells with genetic material
and inorganic bacteria-sized particles. These tools allow
determination of whether microbes are associated with the
animal of interest. However, only labeled specific nucleotide
probes enable researchers to localize a specific microbe or the
expression of particular microbial genes in a given sample.
From PCR and sequencing, a researcher can obtain a 16S
rRNA sequence to identify microbes in a host animal. Micro-
scopy then becomes an essential complement to the molecular
data (Fig. 2). Persistent microbial associates of invertebrates
can be identified by PCR or DGGE, but the source of a given
sequence must be confirmed by localizing the sequence to
microbial cells in the sample.
5.6 Localization of microbes in animal hosts using in situ
hybridization
Analyzing the microbial community of filter-feeding animals
such as sponges, tunicates, and bryozoans can be a daunting
task. In situ hybridization (ISH), a technique in which probes
labeled with fluorescent molecules or enzymes that catalyze
colorimetric reactions bind to a desired target, is a powerful
tool for localizing microbes in complex environmental com-
munities and correlating expression of specific genes to specific
microbes. The method involves incubating labeled oligonucleo-
tide or polynucleotide probes, which can be specific to groups
of microbes or to individual species, with fixed animal tissue,
and then visualizing a probe-specific signal with the micro-
scope. Images of labeled microbes in animal tissue allow con-
firmation of the presence and abundance of specific microbes,
in addition to localization on a microscopic scale. The ability to
localize microbes in animal hosts is indispensable for investi-
gating symbioses. Localization of a particular 16S sequence in
microbial cells is necessary for confirmation that the bacteria
are associated with the host rather than incidentally in the
seawater or on the animal surface during sampling. Haygood
and Davidson used this approach to localize E. sertula in
B. neritina larvae, showing that the larval pallial sinus
exclusively contained E. sertula.56
Schmidt et al.32
used cell
separations, DGGE, PCR and fluorescent in situ hybridization
(FISH) to identify and localize a filamentous δ-proteo-
bacterium, “Candidatus Entotheonella palauensis”, in the
sponge T. swinhoei (Fig. 4). Localization of the symbiont by
FISH confirmed that the sequence obtained did indeed origin-
ate in the filamentous microbe, and that the candidate symbiont
was sufficiently abundant for the production of theopalau-
amide (22), which was previously localized to the filaments.31
Localizing a particular microbe in situ can also reveal host
adaptations to symbiosis, indicative of a tight association
between microbe and host. One example is the transmission of
symbiotic microbes via reproductive tissues to future offspring,
documented in numerous marine invertebrate–microbe associ-
ations, including but not limited to bryozoans14,56,57
and
bivalves.58–63
In addition, microbes visualized in specialized host
structures, such as bacteriocytes,64,65
are likely to be important
symbionts. Such adaptations suggest that there has been
evolutionary selection for the maintenance of these specific
bacteria, and recent research suggests that vertical symbiont
transmission may be reflected by highly co-evolved host–
symbiont associations.66
Physiological and behavioral adap-
tations for symbiont transmission have also been noted in many
sponge species. The transmission of maternal bacteriocytes into
developing embryos has been found in Petrosia ficiformis,
Chondrosia reniformis, and at least two species of Oscarella.67–69
Characterizing the complex microbial communities associated
with invertebrate tissues remains a significant challenge, but
identifying bacteria associated with reproductive structures in
invertebrates may offer a targeted approach to identifying those
microbes significant to the biology of the host, including those
that may be responsible for bioactive metabolite biosynthesis.
5.7 Localization of expression of biosynthetic genes in
symbiotic systems
If a candidate biosynthetic gene (section 2.1) has been isolated
from a sample, nucleotide probes targeting the messenger RNA
(mRNA) transcripts of the gene can be designed, labeled, and
used to localize its expression. Co-localization of a biosynthetic
Fig. 4 Fluorescent in situ hybridization of E. palauensis in tissue of the
sponge Theonella swinhoei.32
A, B Universal bacterial 16S rRNA probe;
C, D E. palauensis-specific 16S rRNA probe. A Fluorescence
micrograph of unicellular bacteria (400 ×); B Fluorescence micrograph
of filamentous bacteria (800 ×); C Light micrograph (400 ×); D
fluorescence micrograph (400 ×) (arrows indicate identical filaments in
C and D). Used with permission of the authors and the journal Marine
Biology.
132 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

transcript and a specific 16S rRNA sequence in the same
microbial cell can confirm that the biosynthetic gene is
expressed in the microbe. Davidson et al. used this approach in
B. neritina by constructing a ribonucleotide probe targeting the
transcript containing one of the KS domains in the putative
bryostatin biosynthetic gene cluster. This ribonucleotide probe
hybridized to mRNA within E. sertula cells, providing conclu-
sive evidence that E. sertula expressed this domain.14
5.8 Obstacles with in situ hybridization in symbiotic systems
There are technical challenges involved in using in situ hybridiz-
ation to localize microbes within animals, in contrast to pure
cultures or environmental microbial samples. One problem is
the autofluorescence of the host tissue, and in some cases, of
the microbial biomass as well. Commonly used fluorescent
labels, such as fluoroscein, rhodamine, and their derivatives,
absorb and fluoresce at similar wavelengths as many endo-
genous organic compounds. In epifluorescence microscopy,
filters for fluorescently tagged probes are designed to detect
emission within a range of approximately 50–60 nanometers.
With the advent of confocal laser scanning microscopy and
improvements in imaging software and technology, it is now
possible to detect emission over a narrower range. This allows
the researcher to specifically target the emission wavelength of
the fluorophore molecule, blocking out most autofluorescence
from the unhybridized portion of the sample and increasing the
signal to noise ratio. Systems that rely on colorimetric detection
have an analogous difficulty. Because these detection schemes
employ probe labels such as biotin and rely on activity from
enzymes such as phosphatase and peroxidase, any biotin or
enzyme activity endogenous to the host tissues will result in a
false signal. Reagents and protocol modifications that success-
fully block endogenous activity have been developed, and are
crucial for these types of detection schemes. Because of back-
ground issues, extensive controls are required for all in situ
hybridization experiments.
Another technical challenge is that because symbionts have
reduced growth rates relative to free-living microbes,70
this
results in lower levels of rRNA, and hence reduced signals in
ISH.71–73
Protocols have been developed for signal amplification
that allow the visualization of smaller microbes with decreased
metabolic activity.74,75
Catalyzed reporter deposition-FISH
(CARD-FISH), utilizing tyramide signal amplification (TSA),
has been developed and adapted to increase signal above back-
ground levels.74
This can also be done with colorimetric detec-
tion schemes. In addition to CARD-FISH, ribonucleotide
probes, which target 16S rRNA and contain multiple labels,
have been used for detection of slow-growing microbial popu-
lations.71,75,76
These modifications in detection methods should
significantly improve the researcher’s ability to detect symbiotic
microbes in environmental samples and animal tissues.
6 Identifying and isolating biosynthetic genes
When several of the criteria described above are consistent with
a microbial symbiont being the source of a bioactive com-
pound, the stage is set to isolate the biosynthetic genes from the
symbiont to enable definitive proof. In this section we will dis-
cuss approaches to do this (Fig. 5).
6.1 Enrichment of the bacteria symbiont
In situations where the symbiont cannot be or has not been
cultivated, physical isolation of bacteria, and specifically the
symbiont, is valuable. While isolation of a pure sample of the
symbiont bacteria can be difficult, there are methods of
enrichment. In some cases, different portions of the host tissue,
which can be differentially enriched in particular microbes, can
be separated prior to subsequent processing. An example is the
isolation of bacterial symbionts from T. swinhoei, where differ-
ent bacterial fractions were isolated after separating the red
ectosome from the endosome prior to homogenization and dif-
ferential centrifugation.31
An analogous approach would be the
isolation of a life cycle stage of a host that is enriched in a
particular symbiont. For example, the larvae of B. neritina
harbor almost exclusively E. sertula, whereas in the adults,
other bacteria predominate (Fig. 3). One consideration in this is
the total number of bacteria that may be associated with a life
cycle stage; in B. neritina, insufficient bacteria were present in
the larvae to yield enough DNA for clone library construction,
although enough was present for PCR. Even though adult B.
neritina has many other associated bacteria (Fig. 3), the amount
of E. sertula was high enough for library construction after
enrichment. Even if host tissue fractionation or life cycle stage
enrichments are not possible, a total source material homo-
genate (host plus symbiont and environmental bacteria) can be
fractionated by differential centrifugation. In the case of adult
B. neritina, after thorough homogenization of tissue using a
Polytron, centrifugation at a low speed (164 × g for 15 minutes)
removed large aggregates of host tissue. The extent of homo-
genization required may vary according to the host and its
associated bacteria, and needs to be monitored by examining
the bacteria post-homogenization. For isolation of bacteria
from B. neritina tissue, two rounds of low speed centrifugation
removed a substantial amount of host material without losing
too much associated bacteria in the process. A subsequent high-
speed centrifugation of 16,000 × g for 10 minutes pelleted the
bacteria. Although the resulting pellet was enriched in bacteria,
there was residual host tissue and cellular components as well
as environmental material. When isolating bacteria for the pur-
pose of obtaining DNA, one method to consider to further rid
the sample of host DNA is treatment with deoxyribonuclease
(DNase) to degrade host DNA associated with the homogenate
prior to extraction of bacterial DNA. This could provide
enrichment for bacterial DNA, however there can be problems
with this approach, as discussed below.
Another method for enrichment is the separation of bacteria
using density gradient centrifugation. This procedure involves
isolating an enriched bacterial sample, and centrifuging the
material on a Percoll (Pharmacia) gradient. This will separate
bacteria based on their buoyant density and can also remove
host cellular material based on the same principle.30
Fluor-
Fig. 5 Approaches for cloning bioactive metabolite genes.
133N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

escence activated cell sorting has also been used to separate
symbiotic bacteria from their hosts.29
6.2 DNA isolation, purification, and enrichment procedures
Before isolating DNA to characterize and clone symbiont
genes, it is important to consider what treatments and charac-
terization steps will be required after the DNA is isolated. For
example, if the particular gene cluster is predicted to be large,
then clone libraries with larger inserts are desirable, which
requires isolating high molecular weight DNA. In general, iso-
lating high molecular weight DNA is advantageous, however,
for some characterization steps, lower molecular weight DNA,
which requires less care in preparation, may be sufficient. Ano-
ther consideration is the presence of inhibitors to DNA
manipulations (restriction digests, cloning reactions, PCR), and
how these may be removed.
An effective method for DNA extraction that works on most
tissue types is to freeze the material at Ϫ80 ЊC, and then grind
aliquots in a small amount of dry ice with a pre-chilled mortar
and pestle. After the material is pulverized into a fine powder, it
is added to an extraction buffer (see below). With careful
manipulation, DNA isolated by this method is of high enough
molecular weight for most cloning approaches. An advantage
of this technique is that there is little opportunity for endogen-
ous nucleases to degrade the DNA, provided it remains frozen
until added to the extraction buffer. A disadvantage is that there
is no opportunity for pre-enriching the symbiont, unless
enough symbiont material can be isolated prior to freezing.
DNA can be extracted from pulverized frozen material or live
material by a short (5 min) incubation in an extraction buffer
followed by phenol : chloroform partition. We use the extrac-
tion buffer of Davidson et al.14
which inhibits endogenous
nuclease activity, lyses cell membranes, and extracts and
denatures proteins. This buffer has proven effective on a variety
of organisms. For B. neritina we have tested more benign
buffers with a goal of digesting the bacterial cell wall with lyso-
zyme prior to extraction or treating enriched bacterial cells with
DNase to remove contaminating DNA, and found that the
DNA was substantially degraded by endogenous nuclease
activity. Depending on the system, it may be worth attempting
these procedures; however, in general the more rapidly the cell
material is incubated in extraction buffer and treated with
phenol : chloroform, the more intact the DNA. To obtain high
molecular weight DNA, during the phenol : chloroform parti-
tion it is critical to very gently mix the aqueous and organic
layers for an extended period of time. We use a rotator appar-
atus that inverts the tube with the extraction mixture at 25 rpm
for 40 min. After centrifugation, the aqueous layer is gently
removed with a large-bore pipet into a new tube, and the extrac-
tion repeated for 20 min. The use of large-bore pipets and pipet
tips is essential to minimize DNA shearing which will reduce
the average size of the DNA. After extraction the DNA can be
precipitated using standard procedures.77
Precipitated DNA
can either be pelleted by centrifugation or if sufficient quanti-
ties are present, removed by spooling on a glass rod. The latter
technique has two advantages, 1) if inhibitors are present that
co-pellet with the precipitated DNA, then a larger proportion
of them can be removed, and 2) spooled DNA is easier to
resuspend and disperse in solution than the compact DNA
pellet resulting from centrifugation.
Uncharacterized inhibitors can be a significant problem for
subsequent manipulations with DNA especially considering
that they can co-purify during DNA precipitation; several
methods can be tried to remove them. A simple one to remove
small inhibitors is to pass the DNA solution through a Sepha-
dex-based spin column normally used to remove oligonucleo-
tides from PCR reactions. Another is to use silica-based DNA-
binding columns for cleanup. DNA treated in these ways is
likely to be sheared by the manipulations, and so may not be
suitable for cloning, but can be used for PCR. The best method
to maintain high molecular weight DNA during cleanup is
CsCl gradient centrifugation.
6.3 Purification and enrichment of DNA on CsCl gradients
After DNA has been extracted from a sample, there are options
for further purification and enrichment. A most effective
method to remove inhibitors and cellular RNA is to subject the
bacterial DNA to ethidium bromide–caesium chloride (EtBr–
CsCl) equilibrium density gradient ultracentrifugation. The
high ionic strength of the solution facilitates dissociation of
pigments and inhibitors from the DNA. In the case of DNA
preparation from B. neritina, ultracentrifugation resulted in
RNA, pigments, and inhibitory compounds migrating to the
bottom of the gradient, while the DNA complexed with EtBr
was at a higher level (unpublished results). This method
successfully removed an inhibitory pigment associated with
DNA from B. neritina.
Once inhibitory compounds have been removed, there is the
option of further enrichment of the symbiont DNA, especially
if several bacterial species are present in the sample. Enrich-
ment can be an important factor because it reduces the number
of clones required in a library, and increases signal : noise in
other analyses such as Southern hybridizations (section 7.5).
One enrichment method is to fractionate the DNA on a CsCl
gradient containing Hoechst 33258 dye (Behring Diagnostics).
Hoechst dye is a bisbenzimide DNA intercalator that will bind
differentially to DNA based on the percentage of adenines and
thymines (AT%) in the sequence. Ultracentrifugation will
separate the DNA into distinct bands on the CsCl gradient,
based on differences in AT content. Bands can be removed from
the gradient in small fractions and the fractions then screened
by PCR for genes known to be contained within the symbiont
genome to identify those fractions most highly enriched in
symbiont DNA. In attempting to enrich for E. sertula DNA
from B. neritina, 5 bands were observed, which corresponded to
various environmental bacterial DNAs, the symbiont DNA,
and residual host DNA not removed in the bacterial enrich-
ment protocol.
6.4 Monitoring the extent of enrichment
Because of the diversity of bacterial communities in marine
invertebrates, enrichment procedures can vary in their
effectiveness from organism to organism. It is thus vital to
monitor the extent of enrichment in each case. There are several
methods to accomplish this. Competitive PCR (Fig. 6) can be
used to quantify the amount of symbiont DNA in a sample. In
this method, known amounts of a cloned symbiont gene frag-
ment, with a small internal deletion or insertion to alter size
and allow resolution on an agarose gel (a “competitor”), are
added to samples of the target DNA. Primers specific to the
gene are then used for amplification. Because of competition
between full and altered-sized copies, the ratio of amplified
products reflects the ratio of the amount of target and competi-
tor DNA. Because the initial amount of added competitor is
known, the amount of target DNA can then be estimated. By
comparing enriched with pre-enriched samples, the extent of
enrichment can be determined. In the case of the B. neritina/
E. sertula system competitive PCR (Fig. 6) indicated a 5.5 fold
enrichment of E. sertula DNA by preparing a bacterial fraction
by differential centrifugation, and a 2.9 fold additional enrich-
ment using Hoechst dye–CsCl gradient fractionation, for an
overall 16 fold enrichment.
A technique called quantitative real-time (QRT) PCR pro-
vides a more accurate method for determining levels of specific
DNA sequences in a sample but requires specialized equip-
ment. This method is based on detection of a fluorescent signal
that changes proportionally during amplification of a PCR
product. There are two general methods for QRT PCR (see78
134 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

for a brief review). The first involves adding the fluorescent dye,
SYBR Green I, to a PCR reaction.79
SYBR Green I binds to
double stranded DNA, and as products from the PCR accumu-
late, an increasing fluorescent signal is generated. By com-
parison with standards, one can determine the initial quantity
of the gene being amplified. There are variations on the second
method, which in general involves fluorescently labeled primers
whose fluorescence increases or decreases (due to quenching) in
relationship to the progress of the PCR reaction.78
A significant
advantage of the SYBR Green method is that standard primers
are used rather than more expensive fluorescently labeled
primers. QRT PCR has advantages over competitive PCR, the
most notable being the precision of detection and the rapidity
of data collection,78
but for most cloning procedures only an
approximation of the extent of enrichment is necessary.
Another, less quantitative, method to monitor enrichment is
to use Southern blot hybridizations. This method (described
in section 7.5) involves hybridizing symbiont specific probes
to a blot containing enrichments of the symbiont DNA and
comparing the intensity of hybridization. A drawback of this
approach is that more input DNA is required and a certain level
of enrichment may be necessary to even detect a signal.
6.5 DNA preparation for pulsed field gel electrophoresis
(PFGE)
In pulsed field gel electrophoresis, controlled changes in the
direction of an electric field through an agarose gel enable the
separation of very large DNA, on the order of entire bacterial
genomes, 4–5 megabase pairs (Mbp). Isolating very high
molecular weight DNA (on the order of 0.5 Mbp or larger) as
one does for PFGE is essential for cloning very large fragments
in bacterial artificial chromosomes (BACs), and can provide
accurate genome sizing information if only a single bacterial
species is present. In theory PFGE could be used to separate
chromosomes of different bacteria in a mixture, provided their
genome sizes are sufficiently different.
Fig. 6 Competitive PCR analysis of DNA preparations from Bugula
neritina. Panels depict agarose gel separations of PCR products,
amplified from the KSa β-ketoacyl synthase domain of the putative
bryostatin gene cluster.14
The upper band in each panel is amplification
from the authentic gene copy, and the lower band is from an added
amount (in picograms, denoted at the top) of a clone of KSa with a
small internal deletion (the competitor). When the amount of
competitor DNA is equivalent to the amount of the authentic gene
copy, amplification products in the upper and lower bands are of equal
intensity (e.g. panel A, 50 pg). By titrating the amount of added
competitor, conditions are determined where amplification is equal,
and the greater the amount of competitor needed for this (or the
transition between lower and upper band predominance), the more
enriched the genomic DNA is in the target gene. In this experiment, A)
total, B) bacterial enriched, and C) Hoechst dye–CsCl gradient
fractionated DNAs (see text for explanation) are compared. The data
indicate 5.5-fold enrichment in the bacterial fraction, and 16-fold
enrichment in the Hoechst dye–CsCl gradient DNA, relative to the
total DNA preparation.
There are special procedures to prepare DNA for PFGE. The
first step is to determine whether sufficient bacteria of interest
can be isolated from the organism. This is critical because
unless enough bacteria are present, DNA will not be visible on
the pulsed field gels. The amount of bacteria is substantial; for a
4.3 Mbp genome, a concentrated pellet containing on the order
of 1 × 1010
cells resuspended in 1 ml is required. This number
can be difficult, if not impossible, to achieve with most
symbiotic bacteria. In addition to the large number of cells, a
complicating factor can be the presence of residual host cell
material; if this cannot be adequately removed, then it may not
be possible to obtain a sufficiently concentrated bacterial
sample. Both difficulties have occurred in our work on the B.
neritina/E. sertula association, and PFGE has not been success-
ful. If a sufficient amount of bacteria can be isolated, then a
concentrated solution of the bacteria is mixed with molten
agarose and poured into a mold to form a plug. Once solidified,
plugs are placed in an EDTA solution and then treated with
SDS and protease over an extended period to lyse the bacteria
and digest their proteins. The semi-solid nature of the agarose
prevents the cells from bursting during lysis, which is a primary
cause of DNA shearing during isolation. For PFGE, the plugs
are then incorporated into a gel and electrophoresed using
parameters optimal for the desired size range separation. DNA
can be digested for cloning by equilibrating plugs with restric-
tion buffer and adding restriction enzyme in a prolonged incu-
bation.80
DNA can then be isolated from gels for cloning by
using agarose digesting enzymes or electroelution, taking care
to minimize manipulations that may reduce the size of the
DNA.80
7 Cloning of biosynthetic genes
The strategy to choose for cloning a bioactive metabolite path-
way depends on the size of the region to be cloned and whether
all elements of the pathway are likely to be located in proximity
in the genome. An additional factor is the average molecular
weight of DNA that can be isolated. We will address these
issues while discussing cloning approaches and their advantages
and disadvantages.
7.1 DNA requirements and general cloning procedures
As a result of the manipulations during isolation procedures,
DNA is sheared to a particular average size. In addition, differ-
ences in endogenous nuclease activities can also result in size
differences. Clone libraries are usually generated by cutting the
DNA into pieces by partial digestion with restriction enzymes
that cut frequently. The advantage of this is that a given region
of a genome is covered by multiple DNA fragments that over-
lap each other, which after cloning, usually allows complete
coverage of the region of interest. Ideally, the average size of
DNA prior to digestion should be five times larger than the
desired size for cloning to ensure that the ends of most DNA
fragments will result from restriction digestion and not shear-
ing. However, representative libraries can be made from DNA
three times larger than the desired size for cloning.
Even though large DNA is advantageous for cloning, it is
inherently viscous, which creates problems in its manipulation,
especially prior to digestion. Precipitated high molecular weight
DNA is difficult to resuspend, but this is best accomplished by
gently shaking the tube containing the DNA for several hours
to overnight. Even with this treatment, dispersal of the DNA
evenly throughout the solution can be difficult. Evidence for
uneven dispersal can be obtained by reading the absorbance of
equal volume aliquots from the same solution of DNA in a
spectrophotometer. If the DNA is not evenly dispersed, signifi-
cantly different absorbances will result in subsequent aliquots.
It is important to have evenly dispersed DNA to enable repro-
ducibility in restriction digests and other manipulations. To
maximize the homogeneity of large DNA in a solution, gently
135N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

and repeatedly pipeting the solution with a large bore pipet is
advisable. This may result in some DNA shearing, but is often
necessary.
Partial digests are done by incubating a standard amount of
DNA with different concentrations of restriction enzyme for a
given time. An enzyme typically used for partial digests of
genomic DNA is Sau3AI, which recognizes the frequently
represented sequence GATC. Pilot-scale digests are usually
done to estimate the amount of enzyme to use in larger scale
digests. It is advantageous to use the same concentration of
DNA in all digests because it minimizes variability and enables
scaling up. For example, we typically use DNA at 100 ng µlϪ1
,
and add restriction enzyme based on units of enzyme per µl.
After assembling reactions on ice, restriction enzyme is added,
thoroughly mixed, and the sample incubated at 37 ЊC for 1 h.
EDTA is then added to 20 mM, and the sample incubated at
70 ЊC for 15 min to inactivate the enzyme. Digestion products
are electrophoresed on an agarose gel and the average molecu-
lar weight of digested DNAs is compared with standards to
identify the optimal amount of enzyme. A valuable approach is
to digest more DNA than needed for the gel and examine a
portion of it for size. In samples having optimal size, the
remainder of the digest can be used for cloning. When doing
partial digests, one should keep in mind that occasionally the
distance between cut sites can be significantly larger than the
average molecular weight of the desired digestion product. In
this case a particular region may be underrepresented in a clone
library. We have encountered this in the E. sertula PKS cluster.
Digested DNA should be size-fractionated prior to cloning
to minimize cloning undesirably small fragments and to maxi-
mize insert size. For small-insert size libraries (< 10 kilobase
pairs – kbp), fractionation can be done by separation in an
agarose gel, and DNA extracted from the gel for cloning.
Because recovery from gels can be low, it is important to digest
sufficient DNA to ensure enough material for ligation. For
larger insert size libraries (10–40 kbp), sucrose gradient frac-
tionation of DNA is an effective means of size enrichment. A
simple means of generating an approximately linear sucrose
gradient is to pour a step gradient with equal volumes of 40%,
30%, 20%, and 10% sucrose in buffer and then freeze and thaw
the solution in the tube. During the thawing process, the less
concentrated sucrose melts more slowly and as it does so it
migrates up the tube, linearizing the gradient. We form
gradients containing sucrose in a buffer of 10 mM Tris, pH 8.0,
10 mM NaCl, 1 mm Na2EDTA, and using 2.5 ml of each
concentration of sucrose in an SW41 ultracentrifuge rotor
(Beckman) tube. Samples, in volumes of 1 ml or less, are layered
on the gradients and centrifuged in the SW41 rotor for 22 h at
22,000 rpm and 20 ЊC. After centrifugation, gradients can be
fractionated in small aliquots (200–400 µl) from the top using a
pipettor. Fractions are analyzed on an agarose gel to determine
those with DNA sized suitably for the desired cloning. One
possible drawback of the sucrose gradient method is that large
amounts of partially digested DNA (50–100 µg) must be loaded
in order to visualize individual fractions. For bacterial artificial
chromosome library construction, size fractionation in pulsed-
field gels is the method of choice (section 6.5).
7.2 Cloning vectors
7.2.1 Lambda phage cloning. Very efficient cloning systems
have been developed based on the life cycle of lambda bacterio-
phage. This phage contains a double stranded DNA genome of
approximately 48 kbp. The phage infects E. coli and can repli-
cate its genome many-fold while producing specific proteins
that package the DNA into progeny phage, eventually lysing the
E. coli cell and releasing the progeny. Specific portions of the
phage genome can be removed, enabling insertion of DNA to
be cloned. There are limitations on the upper and lower size of
DNA that can be cloned due to space requirements in the phage
head. For cloning, DNA constructs are mixed with com-
mercially available extracts of phage proteins, which self-
assemble and package the DNA into an infectious phage
particle. The extremely efficient cloning available in phage-
based systems can be important in the case of limiting amounts
of DNA; in our experience, an entire lambda phage library can
be constructed from 10 µg of starting DNA, which includes
several test digests. There are several different types of lambda
phage cloning vectors, which are described in the following
sections.
7.2.2 Lambda replacement vectors. In these vectors, a large
portion of the phage DNA is removed and replaced by the
DNA to be cloned. DNA of 9–23 kbp can be efficiently cloned
in replacement vectors. A disadvantage of these vectors is that
after cloning, recovering enough DNA for sequencing or sub-
cloning can be an involved process. DNA preparations require
infection of the host E. coli strain at a precise titer, which needs
to be determined for each clone. Even though lambda phage
DNA purification kits are available, we have found that DNA
recoveries were generally low and prefer using a classical
method relying on infection of a moderate sized culture (250
ml) coupled with glycerol gradient purification of the phage,
followed by a phenol : chloroform-based DNA extraction
method.77
7.2.3 Lambda insertion vectors. In lambda insertion vectors
no portion of the phage genome is removed, and up to 12 kbp
fragments can be inserted. The most sophisticated of these
vectors (e.g. Lambda ZAP, Stratagene) also allows in vivo
(in E. coli) excision of a plasmid containing the cloned region
from the phage DNA. Plasmid rescue enables isolation of large
amounts of cloned DNA, circumventing the problems with
low DNA recovery in lambda replacement vectors. Insertion
vectors are useful for cloning smaller segments of a pathway or
filling in gaps in a sequence.
7.2.4 Cosmid vectors. Cosmid vectors contain only portions
of the lambda phage genome required for packaging, and thus
enable cloning of large (30–42 kbp) inserts, and in addition have
an E. coli plasmid origin of replication. Because cosmids
encode no phage proteins, they do not lyse the host, and
because they replicate as a plasmid, relatively large amounts of
DNA can be easily isolated. The large insert size capability of
cosmids is important when cloning large pathways. A possible
disadvantage of cosmids arises from their relatively high copy
number, which can lead to rearrangements or deletions of
cloned DNA that are detrimental to E. coli (see below).
7.2.5 Bacterial artificial chromosomes (BACs). BACs are
single-copy-per-cell vectors that allow the cloning of very large
pieces of DNA (100 kbp or larger). The major advantages of
BACs are that large pathways can be cloned in a single frag-
ment, a small number of clones are needed to represent an
entire genome in a library and because of the low copy number,
cloned DNA is stable. DNA appropriate for BAC cloning has
to be prepared in agarose plugs, as in pulsed-field gel electro-
phoresis (section 6.5). As mentioned, very high molecular
weight DNA suitable for BAC cloning can be difficult to obtain
in sufficient quantity. A commercially available BAC vector is
now available (pCC1BAC, Epicentre Technologies), in which
the copy number of the vector can be amplified to enable iso-
lation of reasonable amounts of DNA after cloning. Screening
BAC libraries by hybridization can be challenging; because of
the low copy number, signals for positive clones are not much
more intense than for clones without the appropriate insert.
Placing colonies or their purified DNA in a grid on a hybridiz-
ation membrane can be helpful in this regard. An alternative
approach is to screen pools of clones by PCR, sequentially
selecting sub-pools until the desired clone is isolated.80
136 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

7.2.6 Plasmid vectors. As part of any cloning project, the
use of plasmid vectors is a valuable asset. It is generally not
recommended to generate primary libraries in plasmids because
of limitations in insert size and the relative inefficiency of
cloning, however subcloning phage or other clones in plasmids
will probably be necessary. Well established methods77
can be
used for plasmid cloning.
7.3 Screening strategies and probes
After constructing a library, the number of phage plaques or
colonies to be screened depends on the percentage of symbiont
DNA in the preparation, the genome size of the symbiont, and
the average size of insert in the library. Oftentimes, only the last
parameter can be known with certainty, but approximations of
the other two parameters can lead to meaningful estimates.
Bacterial genomes are generally under 6 Mbp, and in the case
of symbionts, the tendency is for even smaller genomes, on the
order of 3 Mbp or less.64
The percentage of symbiont DNA can
be estimated by competitive or quantitative PCR, as described
previously. As an example, let us assume that the symbiont gen-
ome is 4 Mbp, that symbiont DNA represents 10% of the total,
and that the average size of insert in the library is 35 kbp, as in a
cosmid library. To ensure a 99% probability that a given gene
will be represented in the library, one should screen five times
the equivalent of one genome and for a 95% probability three
times. Thus, for a 99% probability, 4 Mbp ؒ 5/0.035 Mbp ؒ 0.1 =
5714 colonies or plaques will need to be screened. Even if the
symbiont DNA is only 1% of the total, for a cosmid library,
57,000 clones need to be screened, which is a very manageable
number. Thus, even if the symbiont DNA represents a small
fraction of the total, generating libraries to isolate the gene of
interest is worthwhile.
As a probe for screening libraries, ideally one would like to
have a gene fragment from the metabolite pathway to be cloned.
Gene fragments can be generated by PCR, as described pre-
viously. If a probe cannot be generated from the symbiont of
interest, one can try to use a probe derived from a similar gene
in another organism. A potential problem with this approach is
that because heterologous probes are not likely to match the
gene of interest perfectly, hybridizations must be done under
lower stringency conditions. This can lead to higher back-
ground, making it difficult to isolate truly positive clones. When
using heterologous probes, it is a good idea to determine opti-
mal hybridization conditions by Southern blots (section 7.5)
prior to screening a library.
7.4 Cloned fragments that rearrange in, or are detrimental to
E. coli
The problem of cloning DNA fragments that rearrange or are
detrimental to propagation in E. coli can be serious, and one
that is difficult to track down. The literature contains little
discussion on this subject, because only successful cloning
attempts are reported. Most cloning and expression vectors are
developed with well-characterized genes encoding small soluble
proteins that are stable and allow for maximum levels of expres-
sion. When cloning a bioactive metabolite pathway, which in
many cases encodes large protein complexes, the situation can
be very different. Three possible problems can occur, 1)
rearrangement or deletion of portions of a clone, 2) “leaky”
expression of genes that produce a protein toxic to E. coli, or 3)
leaky or induced expression making a protein that removes
significant amounts of metabolic pathway intermediates from
E. coli, inhibiting growth.
Simply cloning large fragments of DNA in a high copy
number vector can put a strain on the E. coli DNA replication
machinery. For example, if a 35 kbp fragment is cloned in an 8
kbp cosmid vector maintained at 25 copies per cell, the addi-
tional DNA represents 25% of the E. coli genome size. A sig-
nificantly larger genome is selected against, which encourages
mechanisms that reduce genome size. This is one of the advan-
tages of maintaining libraries in lambda phage, because they
divert the E. coli replication machinery for their own ends and
are not subjected to selective pressures generated by constraints
on E. coli growth.
Rearrangements or deletions are caused by recombination
between repeated sequences within a cloned region. Recombin-
ation across inverted repeats will invert the intervening
sequence, whereas recombination between tandemly oriented
repeats can delete the intervening sequence. Since tandem
repeat recombination decreases the size of the cloned fragment,
this can be positively selected for. Maintaining the cloned DNA
in recombination deficient hosts (e.g. SURE E. coli strain,
Stratagene), a low copy number vector, or using a host that
reduces copy number (ABLE E. coli strain, Stratagene) can be
helpful in minimizing or eliminating these problems.
Most E. coli cloning and expression vectors contain the T7
phage promoter, which is used for high level induction of
expression of cloned genes. However, this promoter is “leaky”,
in that it is not completely repressed and there is always a base-
line level of transcription occurring. This can result in expres-
sion of a cloned protein product when it is not desired, which
can be detrimental to E. coli and provide a selection against
cloning a gene. We have encountered this situation several
times, and have expended a considerable amount of effort
troubleshooting cloning protocols, when in fact the trouble was
not in the cloning, but in the clone. There can be positional
determinants involved; for example, we have successfully cloned
larger fragments stably, whereas smaller fragments derived from
the larger clones were unstable, or vice versa. A useful test to
determine if lethality is due to leaky expression is to clone the
fragment in both possible orientations. If clones are only
obtained when the gene is cloned in the opposite orientation
relative to the T7 promoter, then selection against the insert is
likely occurring. Specific host strains and T7 based vectors can
minimize leaky expression; however, problems can still occur.
Other promoter systems for expression can be tested; however
some of these (e.g. the arabinose inducible promoter) are also
leaky.
The mechanisms of cloned gene toxicity in E. coli can vary,
but generally lie in properties of the expressed protein. For
example, proteins containing highly hydrophobic regions can be
toxic to E. coli, either through self-association or association
and disruption of the cell membrane. Another difficulty can
arise if a clone produces an enzymatically active complex that
removes significant amounts of E. coli metabolic pathway
intermediates, inhibiting growth.
7.5 Restriction mapping and Southern blot hybridization
In addition to deletions or rearrangements occurring during
cloning, it is also possible to clone separate gene fragments
from different parts of the genome into the same vector. Either
phenomenon results in gene sequences that appear contiguous
but in fact are not. One way to evaluate whether cloned DNA
has deleted or rearranged is to compare the cloned DNA
sequence with genomic DNA by restriction mapping and
Southern blotting.
A restriction map positions sequences within a region of
DNA by cleavage into defined fragments using restriction
endonucleases. For Southern blot analysis, this process involves
digestion of the DNA of interest with appropriate restriction
enzymes, separation of the resulting DNA fragments by gel
electrophoresis, transfer of these fragments to a nylon mem-
brane and then hybridization of labeled probes to visualize
specific fragments (Fig. 7). If the pattern of fragment sizes
comparing cloned and native DNA match, this indicates that
the cloned DNA is not rearranged or deleted (Fig. 7). In addi-
tion to confirming a restriction map, this type of analysis can
indicate the presence or absence of separate but similar genes of
137N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

a given type in the DNA, indicated by the number of hybridized
bands on the blot. This can be useful in providing evidence that
one has cloned the correct gene or that only one gene of a given
type exists in a symbiont association, contributing to the proof
that an identified gene is responsible for making the natural
product of interest.
To perform these analyses, it is necessary to have the desired
DNA enriched sufficiently from other contaminating DNA. In
some instances, a total DNA preparation from B. neritina has
not been enriched enough in E. sertula DNA for detection by
Southern hybridization. Even when successful, it is clear that
enrichment improves the hybridization signal (Fig. 7).
General protocols for Southern blot analysis can be found in
Sambrook et al.77
However, several variables specific to the
genes of interest should be considered. Restriction enzymes
should be chosen to produce DNA fragments of lengths that
can be resolved on a gel (size ranges are dependent on gel
parameters). The amount of DNA per digest is also important,
especially when the DNA is not a pure sample. For E. sertula,
2–3 µg of symbiont enriched DNA per digest was optimal to
observe hybridization. For pure bacterial DNA, 0.5–1 µg is
sufficient.
7.6 Considerations if the host organism proves to be the
synthetic source of the bioactive metabolite
Because most natural product synthesis activities in marine
invertebrates have been localized to the host and not to their
associated bacteria (Tables 1–3), one should consider what to
do if one wants to isolate genes encoding such activities from
the host. In general, this is a matter of scale up; since the
eucaryotic genomes of the hosts are likely to be between one
and two orders of magnitude larger than those of their associ-
ated microbes, one needs to generate correspondingly larger
clone libraries. The techniques of probe generation, localiz-
ation, DNA purification and enrichment, and cloning of genes
will be similar for both microbes and their hosts. One advantage
of cloning a host bioactive metabolite gene is that the host
DNA is likely to be by far the most abundant in a DNA
preparation.
8 Strategies for sequence determination
Once clones suspected to contain genes for a bioactive metabol-
ite are obtained, the gene sequence must be determined. There
are several approaches to do this, and it is likely that more than
one will come into play, especially when sequencing a large
region of DNA. An initial approach that is helpful for gener-
ating or confirming a restriction map is to digest a larger clone
into smaller pieces and subclone these fragments. Sequence
Fig. 7 Southern hybridization of a putative bryostatin PKS cluster
probe to DNA isolated from B. neritina. Lanes are: 1) total DNA, 2)
bacterial-enriched fraction DNA, 3) Hoechst dye–CsCl gradient
fractionated DNA, and 4) cosmid clone of the region.
determined from the ends of these fragments can then be used
to design primers to sequence in the opposite direction on the
larger clone from which the subclones were derived. This will
result in sequence overlapping the restriction cut sites, allowing
determination of adjacent restriction fragments. For complete
sequence determination two methods are used, primer walking
and shotgun sequencing. With primer walking, after obtaining
initial sequence data, one uses those data to design new oligo-
nucleotide primers to extend the sequence in another round
of sequencing. This is repeated until the entire sequence is
obtained. By determining multiple initial sequences, as in the
fragment subcloning approach described above, one can design
multiple primers for subsequent rounds of sequencing, speed-
ing the process of obtaining the entire sequence. A drawback of
this approach is that there is idle time between designing and
obtaining new primers. With shotgun sequencing, a large piece
of cloned DNA is randomly sheared by passage of the DNA
solution through a narrow orifice (a nebulizer) into fragments
of 1–2 kbp average size. These fragments are cloned into a
plasmid vector, generating a library of pieces of DNA from the
initial large clone. Because the shearing is random, sequencing
of the multiple and overlapping clones in this library results in a
complete sequence of the region.
9 Confirming that cloned genes encode the biosynthetic
machinery for a metabolite
The traditional way of determining gene function is to create
mutations in the gene to eliminate its function. However, this
process relies on the ability to introduce DNA into bacteria by
transformation and have the bacteria either integrate and
express a mutated gene or insert a foreign piece of DNA into a
gene. In situations where the bacterium has not been cultivated
this approach cannot be taken, and one must evaluate the like-
lihood that a gene produces the metabolite in question by
analysis of the encoded protein sequence, and if this is promis-
ing, express the gene in a heterologous host to synthesize the
desired product.
9.1 Analysis of domain content
Once a putative bioactive metabolite gene or gene cluster is
cloned, analysis of its sequence using comparison programs
such as BLAST81
can identify domains with previously identi-
fied function. In the case of many biosynthetic pathway genes,
such domains are well conserved and can reveal information
about gene cluster function as well as the metabolite it pro-
duces. For example, in PKS gene clusters, the sequence of the
genes often corresponds to the order of formation of the poly-
ketide product, as in the modular PKS that produces erythro-
mycin.82–84
In these situations, one can literally “read” the
sequence of domains in the gene and infer the structure of the
polyketide it produces. However, this is not always the case.
PKSs are continually discovered in which the gene sequence is
not co-linear with the formation of the polyketide product
or contains too few or excess domains. This is the case in the
bryostatin PKS from E. sertula (unpublished data), as well as
those PKSs producing stigmatellin and the antibiotic TA.85,86
Some domains can be identified by homology but be non-
functional due to mutations in their active sites, for example in
the pikromycin PKS from Streptomyces venezulae.87
Thus,
analysis of domain content and order can not always be relied
on to predict the function and product of biosynthetic gene
clusters.
If the sequence of a gene cluster is not clearly indicative of
the product it is responsible for making, one can consider iso-
lating the enzymes responsible for synthesizing the metabolite
from the symbiont as a means of confirming that the genes
encode these proteins. If a purified protein preparation is shown
to synthesize the compound in question, then one could per-
form amino acid sequencing to determine whether the proteins
138 N a t . P r o d . R e p . , 2 0 0 4 , 2 1, 1 2 2 – 1 4 2

NPR publication 2004

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (14)

Similar a NPR publication 2004

Similar a NPR publication 2004 (20)

NPR publication 2004