Lecture 8

The DNA Duplex Can Be
Reversibly Denatured (Melted)

Tm (transition midpoint) as a
function of base composition
• Salt
dependence is
more dramatic

Hybridization
• DNA sequences can
spontaneously re-anneal
and form
helices
• Basis for many of
molecular biology
techniques.
– PCR, DNA
sequencing

PCR
•When a sample of DNA is too small to be
sequenced or profiled, the polymerase chain
reaction (PCR) is used to make copies
("amplify") of it.
•PCR amplifies DNA by repetitive cycles of
the following steps.
• 1. Denaturation
2. Annealing ("priming")
3. Synthesis ("extension" or "elongation")

PCR
((aa)) CCoonnssiiddeerr ddoouubbllee--ssttrraannddeedd DDNNAA ccoonnttaaiinniinngg
aa ppoollyynnuucclleeoottiiddee sseeqquueennccee ((tthhee ttaarrggeett rreeggiioonn))
tthhaatt yyoouu wwiisshh ttoo aammpplliiffyy..
TTaarrggeett rreeggiioonn
((bb)) HHeeaattiinngg tthhee DDNNAA ttoo aabboouutt 9955 ℃℃ ccaauusseess tthhee
ssttrraannddss ttoo sseeppaarraattee.. TThhiiss iiss tthhee ddeennaattuurraattiioonn
sstteepp..

PCR
((cc)) CCoooolliinngg tthhee ssaammppllee ttoo ~~6600 ℃℃ ccaauusseess oonnee
pprriimmeerr oolliiggoonnuucclleeoottiiddee ttoo bbiinndd ttoo oonnee ssttrraanndd aanndd
tthhee ootthheerr pprriimmeerr ttoo tthhee ootthheerr ssttrraanndd.. TThhiiss iiss tthhee
aannnneeaalliinngg sstteepp..

PCR
((cc)) CCoooolliinngg tthhee ssaammppllee ttoo ~~6600 ℃℃ ccaauusseess oonnee
pprriimmeerr oolliiggoonnuucclleeoottiiddee ttoo bbiinndd ttoo oonnee ssttrraanndd aanndd
tthhee ootthheerr pprriimmeerr ttoo tthhee ootthheerr ssttrraanndd.. TThhiiss iiss tthhee
aannnneeaalliinngg sstteepp..
((dd)) IInn tthhee pprreesseennccee ooff ffoouurr DDNNAA nnuucclleeoottiiddeess aanndd
tthhee eennzzyymmee DDNNAA ppoollyymmeerraassee,, tthhee pprriimmeerr iiss
eexxtteennddeedd iinn iittss 33'' ddiirreeccttiioonn.. TThhiiss iiss tthhee ssyynntthheessiiss
sstteepp aanndd iiss ccaarrrriieedd oouutt aatt 7722 ℃℃..

PCR
This ccoommpplleetteess oonnee ccyyccllee ooff PPCCRR..
((ee)) TThhee nneexxtt ccyyccllee bbeeggiinnss wwiitthh tthhee ddeennaattuurraattiioonn
ooff tthhee ttwwoo DDNNAA mmoolleeccuulleess sshhoowwnn.. BBootthh aarree
tthheenn pprriimmeedd aass bbeeffoorree..

PCR
((ff) ) Elongation of tthhee pprriimmeedd ffrraaggmmeennttss ccoommpplleetteess
tthhee sseeccoonndd PPCCRR ccyyccllee..

PCR
The two contain oonnllyy tthhee ttaarrggeett rreeggiioonn aanndd
iinnccrreeaassee ddiisspprrooppoorrttiioonnaatteellyy iinn ssuubbsseeqquueenntt ccyycclleess..

PCR results
CCyyccllee TToottaall DDNNAAss CCoonnttaaiinn oonnllyy ttaarrggeett
00 ((ssttaarrtt)) 11 00
11 22 00
22 44 00
33 88 22
44 1166 88
55 3322 2222
1100 11,,002244 11,,000044
2200 11,,004488,,556666 11,,004488,,552266
3300 11,,007733,,774411,,882244 11,,007733,,774411,,776644

The Genetic Code
• The genetic code is found in the sequence of nucleotides in
mRNA that is translated from the DNA
• A codon is a triplet of bases along the mRNA that codes for a
particular amino acid
• Each of the 20 amino acids needed to build a protein has at least 2
codons
• There are also codons that signal the “start” and “end” of a
polypeptide chain
• The amino acid sequence of a protein can be determined by
reading the triplets in the DNA sequence that are complementary
to the codons of the mRNA, or directly from the mRNA sequence
• The entire DNA sequence of several organisms, including
humans, have been determined, however,
- only primary structure can be determined this way
- doesn’t give tertiary structure or protein function

mRNA Codons and Associated Amino Acids

Reading the Genetic Code
• Suppose we want to determine the amino acids
coded for in the following section of a mRNA
5’—CCU —AGC—GGA—CUU—3’
• According to the genetic code, the amino acids
for these codons are:
CCU = Proline AGC = Serine
GGA = Glycine CUU = Leucine
• The mRNA section codes for the amino acid
sequence of Pro—Ser—Gly—Leu

Messenger RNAs
• Contain protein coding information
– ATG start codon to UAA, UAG, UGA Stop Codon
– A cistron is the unit of RNA that encodes one
polypeptide chain
– Prokaryotic mRNAs are poly-cistronic
– Eukaryotic mRNAs are mono-cistronic
mRNA coding patterns

Transfer tRNA
•There are 20 different tRNAs,
one for each amino acid.
•A particular amino acid is
attached to the tRNA by an ester
linkage involving the carboxyl
group of the amino acid and the
3' oxygen of the tRNA.

Transfer RNA
•Example—Phenylalanine transfer RNA
One of the mmRRNNAA ccooddoonnss ffoorr pphheennyyllaallaanniinnee iiss::
55'' UUUUCC 33''
TThhee ccoommpplleemmeennttaarryy sseeqquueennccee iinn ttRRNNAA iiss ccaalllleedd
tthhee aannttiiccooddoonn..
33'' AAAAGG 55''

Phenylalanine tRNA
Each tRNA is single stranded with a CCA triplet at its 3' end.
OOCCCCHHCCHH22CC66HH55
++
NNHH33
OO
Anticodon
3'
5'
55''
3'
5'

Ribosomal Peptidyl Transferase
Activity
Note: the catalytic component of the ribosome’s peptidyl transferase activity
is RNA; it’s an example of a catalytic RNA or ribozyme.

Non-Watson-Crick Base Pairing,
e.g., Hoogsteen Base Pairing
Allow the formation of
triple-stranded helices

Triple Helical DNA: H-DNA
H-DNA structure can form when
you have a homopurine stretch
on a strand (so homopyrimidine
stretch on the other strand).
H-DNA has been implicated in
the regulation of several genes.

Self-Complementary Nucleic
Acid Strands and Hairpins

Palindromic DNA Sequences:
Potential to Form Cruciform Structures (Double Hairpins)

Palindromes and Restriction Endonucleases
Another reason palindromes are important:
Type II restriction enzymes are site-specific endonucleases used in molecular biology research
(such as gene cloning) that recognize specific palindromic DNA sequences.
X-ray crystal
structure of
Eco RI bound
to DNA
DNA cleavage products:
Sticky ends (e.g., Eco RI):
5’-G-3’ 5’-AATTC-3’
3’-CTTAA-5’ 3’-G-5’
Blunt ends (e.g., Sma I):
5’-CCC-3’ 5’-GGG-3’
3’-GGG-5’ 3’-CCC-5’

RNA Helices are short, bulges, loops

tRNA - the prototype structure

Protein-Nucleic Acids Interaction
• Perspective
• Non-specific interactions
• Specific interactions

What functions that DNA-protein interactions
are involved in?
DNA replication, DNA repair,
DNA recombination, transcription etc.
Two effective techniques: X-ray crystallography
and NMR spectroscopy (<25 kDa).
Both are equally valid but neither is sufficient
without detailed kinetic, thermodynamic, and site-directed
mutagenesis studies.

One of the function: The need for
packaging
The fundamental building block of chromatin in
eukaryotes is the nucleosome, a protein-DNA
complex.
The nucleosome core particle consists of 146 bp
of DNA and eight small, highly basic histone
proteins. The DNA wraps around the histone
octomer to form a negative supercoil.
Bacteria also use small basic proteins to package
DNA, such as the dimeric HU protein from E.
coli.

Nucleosome
The Nucleosome -
DNA (146 bp) wrapped around
octamer of core histone proteins (+
linker DNA = ~200 bp)

Viruses are highly symmetric particles that can pack their nucleic
acid genome efficiently inside the protein capsid.
Protein subunits containing many basic amino acids interact with
the viral nucleic acid in a non-sequence-specific manner.
In the helical TMV, some sequence-specific contacts are involved
in directing assembly of the virus.

History of structure determination
Structure of DNA is regular: a list of the
positions of the atoms in the double helix.
Proteins are much less regular, but it is
more difficult to understand, e.g.,
repressors, polymerases.

Aaron Klug
"for his development of
crystallographic electron microscopy
and his structural elucidation of
biologically important
nucleic acid-protein complexes“ (1982)
Alex Rich
Ss nucleic acid-binding protein
Roger Kornberg
"for his studies of the
molecular basis of
eukaryotic transcription“
(2006)

The forces between proteins and
nucleic acids
There are four major forces that occur when proteins and NA interact,
but it is very difficult to ascribe precise changes in free energy of
association to specific interactions between protein and NA.
• Electrostatic forces: salt bridges
• Dipolar forces: hydrogen bonds
• Entropic forces: the hydrophobic effect
• Dispersion forces: base stacking

Electrostatic forces: salt bridges
Electrostatic forces are long range, not very
structure-specific, and contribute substantially to
the overall free energy of association.
Salt bridges are electrostatic interactions between
groups of opposite charge. They typically provide
~40 kJ/mol of stabilization per salt bridge.
In protein-NA complexes, they occur between the
ionized phosphates of the NA and either the e-
ammonium group of lysine, the guanidinium
group of arginine, or the protonated imidazole of
histidine.

Dipolar forces: hydrogen bonds
Hydrogen bonds are dipolar,
short-range interactions that
contribute little to the
stability of the complex but
much to its specificity.
Hydrogen bonds occur
between the amino acid side
chains, the backbone amides
and carbonyls of the protein,
and the bases and backbone
sugar-phosphate oxygens of
the NA.

When protein-nucleic acid
molecules are not
complexed, all their
exposed hydrogen bond
donors and acceptors form
hydrogen bonds to water.
Hydrogen bonds are very
important in making
sequence-specific protein-nucleic
acid interactions.

Entropic forces: the hydrophobic effect
Hydrophobic forces are short range, sensitive
to structure, proportional to the size of the
macromolecular interface.
Molecules of water leave the interface
between a protein and a nucleic acid.
Consequently, the surface of the protein and
nucleic acid tend to be exactly
complementary so that no unnecessary water
molecules remain when the complex forms.

Dispersion forces: base stacking
van der Waals forces
Dispersion forces have the shortest range but are very
important in base stacking in double-stranded nucleic
acid and in the interaction of protein with ss nucleic acid.
Base stacking is caused by two kinds of interaction: the
hydrophobic effect and dispersion forces.
For ds nucleic acid, dispersion forces are clearly
important in maintaining the structure by base stacking.
For ss nucleic acid, they also help it to bind proteins
because aromatic side chains can intercalate between
the bases of a ss nucleic acid.

Geometric constraints imposed
by the nucleic acid
All NA have repeating polyanionic backbones and
so all proteins that bind them have strategically
placed arginines and lysines that create an
electrostatic field to neutralize the negative charge.
Contacts to the bases are called "direct readout"
because what contacts form depends directly on
the sequence of the nucleic acid; distinguishing
sequences by how the sequence affects the
distortability or conformation of the nucleic acid is
called "indirect readout".

Double-stranded B-DNA
Simple model-building predicted two of the many ways in which
proteins interact with B-DNA by hydrogen-bonding:
1) an antiparallel b-sheet interacting to the phosphate backbone in
the minor groove,
2) an a-helix interacting with bases in the major groove.
Thus, to distinguish the cognate sequence from all others by direct readout
alone, protein must form more than one hydrogen bond to some of the
base-pairs in the major groove.
In specific protein B-DNA complexes, about 1/2 of the hydrogen bonds
are to the bases and the other 1/2 to the phosphate backbone.

Single-stranded nucleic acid
Hydrophobic bases in ss nucleic acid are more exposed. Ss
nucleic acid binding protein has more hydrophobic binding
surface than ds nucleic acid binding protein .
The hydrophobic surface often contains aromatic groups
which interact more effectively with the nucleic acid
bases, and also an electrostatic field that neutralizes the
charge of the phosphate backbone.
Possibly because the structure of RNA varies more than
that of DNA, proteins seem to recognize RNAs in more
ways than they recognize DNAs.
RNAs, even more than DNAs, may be recognized by
indirect readout.

The kinetics of forming protein-nucleic
acid complex
Two factors affect the rate of formation of all
protein-nucleic acid complexes: random thermal
diffusion and long-range, directional electrostatic
attraction.
A "one-dimensonal random walk" can account for
the observed rate of genome sequence-specific
protein-DNA complexes.
The protein first binds non-specifically to the
DNA and then diffuses or jumps along the DNA
until it finds the appropriate sequence.

Thus, all sequence-specific DNA binding
proteins may bind DNA in two ways: one
for tight, sequence-specific binding and the
other for looser, non-sequence specific
binding.

Non-specific interactions
• Single-stranded nucleic acid binding
proteins
• Non-sequence-specific nucleases
• Polynucleotide polymerases
• Topoisomerases

Single-stranded nucleic acid
binding proteins
ssDNA is formed during replication and most
organisms produce proteins to bind it. These
proteins form an important but diverse group.
A model has been suggested in which lysines and
arginines neutralize the DNA phosphate backbone
and the bases stack against aromatic amino acid
side chains.

Non-sequence-specific nucleases
All organisms must degrade nucleic acid during their life
cycle. There is no one enzyme designed for this purpose,
but rather a large number of enzymes with different
specificities. These include exo- and endonucleases and
enzymes specific for ss- and ds-nucleic acid and for base
sequences.
e.g., RNase and DNase
RNase and DNase have different reaction mechanisms
because RNase uses the ribose 2'-hydroxyl group, not
present in DNA, to attack the 5'-phosphate ester linkage.

Ribonuclease A, barnase,
and binase
RNase A is not sequence specific because it only interacts
with the base at the active site;
all other contacts are electrostatic ones to the sugar-phosphate
backbone.
Deoxyribonuclease I
DNase I cleaves different sequences with different
rates because of sequence-dependent steric hindrance
at the active site.
G-C tracts accommodate the catalytic loop better
because they have wider minor grooves than A-T
tracts.

Polynucleotide polymerases
There are four classes of template-directed
polynucleotide polymerases: DNA- or RNA-dependent
and DNA- or RNA-polymerizing.
All add nucleotides to the 3'-end of a growing
polynucleotide chain but they differ widely in how
accurately they replicate the nucleic acid (their
fidelity) and how many nucleotides they add
before dissociating (their processivity).

e.g., Pol I and RTase.
They have the same overall architecture for
gripping a nucleic acid during polymerization. It is
a domain that looks like a right hand, with palm,
fingers, and thumb subdomains.
Part of the palm subdomain and the direction from
which the nucleic acid approaches the active site is
conserved in these two polymerases, their 3'-5'
exonucleases, and RNase Hs may all use the same
mechanism, which requires two divalent cations.

DNA-dependent DNA polymerases:
E. coli DNA polymerase I (Pol I) and III
All cellular DNA-dependent DNA polymerases have
a 3'-5' proof-reading exonuclease, require a primer to
begin synthesis, and replicate their own nucleic acid
the most faithfully.
The Klenow fragment of Pol I contains two widely-separated
domains, one carrying the polymerase
activity, and the other the 3'-5' proofreading
exonuclease activity.

The DNA approaches the polymerase from exonuclease side
and bends by 90o to enter the polymerase site.
The protein does not read the DNA sequence at all. Instead,
when an incorrect base is added, the DNA strands separate
and the daughter strand is therefore more likely to reach over
to the exonuclease, which then removes the incorrect base.

RNA-dependent DNA polymerases:
HIV-1 reverse transcriptase (RTase)
RTase is a unique heterodimer. Its two subunits have the
same sequence yet fold differently. The p66 subunit folds
into a polymerase domain and an RTase H domain.
RNase H is an endoribonuclease that specifically hydrolyzes
the phosphodiester bonds of RNA which is hybridized to
DNA.

Topoisomerases
• Type I
• Type II

Positive and Negative
Supercoiling
positive supercoil =
left-handed =
overwound DNA
negative supercoil =
right-handed =
underwound DNA

L = T + W
• L or Lk = linking number (number of times
one strand crosses the other)
• T = twist (number of helical turns; for B-DNA,
T = # bp divided by ~10.5 bp/turn)
• W = writhe (number of supercoils)
(L0 = linking number of relaxed molecule = T,
since W = 0 in relaxed molecule)

Type I Topoisomerases
•ΔL = ±1 per cycle
•Cleaves a single
strand
•Passes broken single
strand around the other,
then rejoins strands
•Does not require ATP
•Relaxes supercoiled
DNA
Ο Ο Ο Ο

Structure of a Type I
Topoisomerase

Type II Topoisomerases
•ΔL = ±2 per cycle
•Cleaves both strands
•Passes unbroken part of
duplex through double-strand
break, then rejoins
strands
•Requires ATP
•Relaxes supercoiled DNA
•Some type II enzymes (like
DNA gyrase) can add
negative supercoils

Topological Interconversions
Catalyzed by Type II Topoisomerase
Relaxation
Catenation and
Decatenation
Knotting and
Unknotting

X-Ray Crystal Structure of a
Type II Topoisomerase

Specific interactions
For a cell to function at all, proteins must distinguish one
nucleic acid from another very accurately.
Proteins that bind specific nucleic acid sequences also bind
non-specific ones.
The placement of an a-helix in the major groove appears to
be the most common way of recognizing a specific DNA
sequence.
Other parts of the protein, which form hydrogen bonds and
salt bridges to the DNA backbone, position the element on
the DNA so that it can achieve recognition.

Direct readout of the DNA sequence, most often in
the major groove, is an important part of sequence-specific
binding but is by no means the only
component.
The direct readout can involve hydrogen bonds (1)
directly to side chains, (2) to the polypeptide
backbone, or (3) through water molecules, or depend
on hydrophobic interactions.
Indirect readout is also important: the correct
DNA sequence may differ from canonical B-DNA
in a way that increases the surface area buried, the
electrostatic attraction, or the number of hydrogen
bonds formed.

Oligomerization upon binding the correct sequence
often increases affinity and specificity.

Transcriptional regulators:
the helix-turn-helix motif
• The prokaryotic complexes
• Eukaryotic complexes: the homeodomain

Exclusively eukaryotic transcriptional regulators:
the zinc finger and leucine zipper
• The zinc finger proteins
The Cys2His2 zinc finger
The Cys4 nuclear receptors
The GAL4 zinc finger
• The leucine zipper

zinc finger proteins
• A zinc finger is a small protein structural
motif
• Sequence-specific DNA-binding proteins

zinc finger proteins
• Individual zinc finger domains
typically occur as tandem repeats
with two, three, or more fingers
comprising the DNA-binding
domain of the protein.
• These tandem arrays can bind in
the major groove of DNA.
• The α-helix of each domain can
make sequence-specific contacts
to DNA bases; residues from a
single recognition helix can
contact 4 or more bases.

b-Sheet binding motifs
• The met repressor family
• The TFIID TATA-box binding protein
a general transcription factor

Restriction endonucleases:
EcoRI and EcoRV
EcoRI and EcoRV have very different structures and
interact with DNA differently: the former only in the major
groove; the latter in both grooves.
However, both employ the same enzyme mechanism and
catalytic residues and both achieve their high degree of
sequence specificity similarly.
In the complex with cognate DNA, much of the free energy
of binding has been used to drive the cognate DNA into an
unfavorable conformation that places the scissile
phosphodiester bond in the active site and completes the
binding site for the essential Mg2+.

Lecture 8

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Lecture 8

Similar a Lecture 8 (20)

Más de Prabesh Raj Jamkatel

Más de Prabesh Raj Jamkatel (20)

Lecture 8