Protein structure is hierarchical, proceeding from primary to quaternary structure. Primary structure refers to the linear sequence of amino acids. Secondary structure involves folding into alpha helices and beta sheets. Tertiary structure describes the overall three-dimensional shape of a polypeptide. Quaternary structure refers to the arrangement of multiple protein subunits. Several methods can determine protein structure at high resolution, including X-ray crystallography, NMR spectroscopy, cryo-electron microscopy, and X-ray free electron lasers.
2. Protein Structure
The arrangement and linking of amino acids to form a
functional protein is viewed in a stepwise fashion
Primary structure – linear order of amino acid residues
in a protein
Secondary structure – three dimensional form of a
protein
Tertiary structure – three dimensional shape of a protein
Quaternary structure – arrangement of multiple protein
subunits in a multimeric protein complex
3. Primary Protein Structure
The linear order of amino acid residues along the
polypeptide chain
Amino acids can be abbreviated by 3 letters or single letter
For example: Alanine = ala or A; Lysine = lys or K
Example - Chymotrypsin
• Enzyme that degrades
other proteins
• 263 amino acids
• 27,713 Da
4. Primary Protein Structure
Insulin is a small protein that consists of two
polypeptide chains that are covalently bonded
The A chain is 21 amino acids long while the B chain is 30
amino acids long
The two polypeptide chains are linked via a –S-S- bond
(called cystine)
5. Secondary Protein Structure
The primary structure leads to the Secondary
Structure
The secondary structure refers to the folded structures
that form within the polypeptide chain due to
interactions between atoms of the backbone
Held in shape by hydrogen bonds and are more or less
independent of the R-groups
Most common types of secondary structure are a helix
and the b pleated sheet
6. Secondary Protein Structure
a-Helix structure
The carbonyl (C=O) group of one amino acid is hydrogen bonded to the
amino hydrogen (N-H) of an amino acid that is four residues down the chain
This pulls the polypeptide chain into a helical structure that resembles a
curled ribbon with each turn of the helix containing 3.6 amino acids
The R-groups of the amino acids stick outward from the a-helix, where they
are free to interact
b-Pleated sheet
Two or more segments of a polypeptide chain line up next to each other
and form a sheet-like structure held together by hydrogen bonds
The hydrogen bonds form between carbonyl and amino groups of the
backbone, while the R-groups extend above and below the plane of the
sheet
Strands of the b-pleated sheet may be parallel, or pointing in the same
direction (such that the N- and C-terminus match up) or antiparallel, or
pointing in the opposite direction (such that the N-terminus of one strand is
positioned next to the C-terminus of the other)
10. Secondary Protein Structure:
Additional Information
Certain amino acids are more or less likely to be found
in a-helices or b pleated sheets
Proline is known as a “helix breaker” due its unusual R
group that creates a bend in the peptide backbone
structure that is not compatible with helix formation
The aromatic amino acids (Trp, Tyr, Phe) are often found
in b pleated sheets
Many proteins contain both a-helices or b pleated sheets;
some contain just one type while others do not form either
12. Tertiary Protein Structure
The overall three-dimensional structure of a polypeptide is
called its tertiary structure
The tertiary structure is primarily due to interactions between
the R groups of the amino acids that make up the protein
This includes hydrogen bonding, ionic bonding, dipole-dipole
interactions, and van der Waals forces
A critical component to tertiary structure are hydrophobic
interactions, in which amino acids with nonpolar, hydrophobic
R groups cluster together on the inside of the protein, leaving
hydrophilic amino acids on the outside to interact with
surrounding water molecules
Disulfide bonds can also contribute to tertiary structure
Can be both inter-strand (between two polypeptide strands) or
intra-strand (within the same polypeptide)
13. Tertiary Protein Structure - Interactions
An example of the various interactions that can lead to
a proteins tertiary structure
Polypeptide backbone
Ionic bond
Hydrophobic
interactions
Disulfide
linkage
Hydrogen
bond
14. Tertiary Protein Structure of Chitinase
Structure of the barley chitinase
• Chitinase is an enzyme that cleaves chitin, which is a polysaccharide found
in fungi, plants and insects
• The side chains of the catalytic acids are shown in green; side chains of
several residues that are (putatively) involved in substrate binding and
catalysis are shown in red and purple
Chitin
Chitinase
15. Tertiary Protein Structure of Triose
Phosphate Isomerase
Triose phosphate
isomerase
Dihydroxyacetone
phosphate D-glyceraldehyde
3-phosphate
Active Site
Forms a b-
barrel
16. Quaternary Protein Structure
For proteins that have only one single polypeptide
chain, the tertiary structure is the most resolved protein
structure
However, for proteins that are made up of multiple
polypeptide chains (also known as subunits), the
combination of all of these subunits is called the
quaternary structure
The same types of interactions that contribute to tertiary
structure also hold the subunits together to give
quaternary structure
An example of a protein with quaternary structure is
hemoglobin
17. Hemoglobin Structure
Hemoglobin is a iron-containing oxygen transport protein
found in erythrocytes (red blood cells)
Composed of four polypeptide chains (tetramer), consisting
of two a and two b subunits (a2b2)
Each subunit has a MW of 16 kDa for a total MW of 64 kDa
Each subunit contains a tightly associated heme group that is
bound to iron
Oxygen binds to the heme component of the tetramer in a
cooperative fashion for a total of 4 oxygen molecules per
tetramer
As the first oxygen molecule binds, the tetramer’s conformation
changes to promote the binding of the remaining three oxygen
molecules
18. Quaternary Protein Structure
Structure of human hemoglobin. α and β subunits are in red and blue,
respectively, and the iron-containing heme groups in green
Hemoglobin
heterotetramer – a2b2
19. Myoglobin Structure
Myoglobin is a heme-containing protein that is found in
muscle tissue, where it binds oxygen, and helps provide
extra oxygen to release energy to power muscles
Is a monomeric protein with 153 amino acid residues
MW of 16.7 kDa
Contains a tightly associated heme group that is bound to iron
• Oxygen binds to the
heme component of the
protein
• Oxidation of iron (Fe+2 to
Fe+3) is responsible for
the red color of muscle
and blood
20. Hemoglobin Binding to Oxygen is
Cooperative
%
O
2
Saturation
PO2 (mm Hg)
Hemoglobin
(sigmoidal)
Myoglobin
(hyperbolic)
tissues lungs
Amount of O2 dissolved in the blood
• Hemoglobin is primarily
responsible for the transport
of oxygen to tissues
• Myoglobin is responsible for
oxygen storage
21. Protein Folding
In order for proteins to achieve their tertiary (or quaternary)
structure, the protein must form the appropriate conformation – this
is called protein folding
Protein folding is a spontaneous process that is primarily guided
by hydrophobic interactions (e.g. hydrophobic effect), hydrogen
bond, ionic bonds and van der Waals forces
Protein folding must be thermodynamically stable
Chaperones are a class of proteins that aid in the correct folding of
other proteins
Chaperones are shown to be critical in the process of protein folding in
vivo because they provide the protein with the aid needed to assume
its proper alignments and conformations efficiently enough to become
"biologically relevant"
22. Protein Denaturation
When a protein loses its 3-dimensional structure and reverts into
an unstructured string of amino acids, this is called protein
denaturation
Denatured proteins are usually non-functional
In some cases, denatured proteins can be reversed, sometimes it
cannot
Proteins can be denatured when heated or exposed to high salt
solutions such as urea (6 M) or guanidine HCl
An example of a denatured protein is egg white (egg albumin);
once heated or vigorously stirred, it becomes denatured and will
not return to its original state
23. Protein Denaturation – Egg Whites
Egg whites consist primarily of water and egg albumin; albumin
consists of a number of proteins
It can be denatured upon agitation or heat
Agitation
Folded Protein Unfolded Protein
24. Protein Structure Determination
There are several methods currently used to determine
the structure of a protein; these are:
X-ray crystallography
NMR
Three dimensional electron microscopy (CryoEM)
X-ray free electron lasers (XFEL)
25. X-Ray Crystallography Overview
X-ray crystallography can provide a detailed “picture” of a
proteins structure, including atomic details such as ligands,
inhibitors, ions, etc.
A protein must be purified and crystallized, then subjected to an
intense beam of X-rays
The protein in the crystal diffracts the X-ray beam into one or
another characteristic pattern of spots, which are analyzed to
determine the distribution of electrons in the protein
The resulting map of the electron density is then interpreted to
determine the location of each atom
Two types of data are collected: The first are coordinate files,
which include atomic positions for the final model of the
structure; the second are data files which include the structure
factors such as the intensity and phase of the X-ray spots in the
diffraction pattern
26. X-Ray Crystallography Process
Workflow consists of three basic steps
Step 1: produce an adequate protein crystal
Step 2: place in an intense beam of X-rays (single or variable
wavelength) to produce a regular reflection pattern
Step 3: the collected data is combined with chemical
information to obtain and refine a model from the arrangement
of atoms – this is called a crystal structure
27. X-Ray Crystallography Process
Crystallization
Generation of a diffraction-quality crystal is the biggest concern
Need a pure crystal of high regularity
Many methods available to grow crystals, such as gas diffusion,
liquid phase diffusion, temperature gradient, vacuum sublimation,
convection, etc.
Data Collection
X-ray irradiation causes the crystal to be diffracted, and the
diffraction data are recorded
Data Analysis
Two-dimensional diffraction patterns corresponding to a different
crystal orientation is converted into a three-dimensional model of
the electron density, which is completed by Fourier transform
analysis
Initial phasing, model building and phase refinement are the final
steps in finalizing a protein structure; in some cases this may
require additional studies such as molecular replacement or heavy
atom methods
28. X-Ray Crystallography – Diffraction
Pattern
Diffraction pattern of Myoglobin – which is a heme-
containing protein which carries and stores oxygen in
muscle
Myoglobin was the first
protein structure solved by
X-ray crystallography; this
led to a Nobel prize for
John Kendrew and Max
Perutz
29. X-Ray Crystallography
Good atomic resolution (e.g. 1 or 2 Angstroms) provides
an outstanding picture of the protein, including locations
of each atom and how it relates to the protein
30. X-Ray Crystallography Facility
X-ray crystallography facility consists of a
electron/beam source, sample and detector
Sample prep (i.e. crystal formation) can be partially
automated with
31. NMR Spectroscopy
Nuclear Magnetic Resonance (NMR) spectroscopy is another
method that can be used to determine the structure of a protein
The protein is purified and place in a strong magnetic field, and
then probed with radio waves
A distinctive set of observed resonances may be analyzed to give
a list of atomic nuclei that are close to one another, and to
characterize the local conformation of atoms that are bonded
together
This list of restraints is then used to build a model of the protein
that shows the location of each atom
The technique is currently limited to small or medium proteins (<35
kDa), since large proteins present problems with overlapping
peaks in the NMR spectra.
32. NMR Spectroscopy
A major advantage of NMR spectroscopy is that it provides
information on proteins in solution, as opposed to those
locked in a crystal or bound to a microscope grid – thus,
NMR spectroscopy is the premier method for studying the
atomic structures of flexible proteins
Analysis is far more complex than with simple small organic
molecules
Multidimensional techniques, such as nuclear Overhauser
effect (NOE) experiments must be utilized which require
labeling the protein with 13C and 15N
NOE experiments measure distances between atoms with the
protein; this distances allow generation of a 3-dimensional
structure of the protein
34. NMR Spectroscopy
Structure of the monomeric hemoglobin (MW = 16 kDa)
using NMR spectroscopy – protein is shown in green
and restraints in yellow
35. 3-Dimensional Electron Microscopy
Three dimensional electron microscopy (3D EM) works by
focusing a beam of electrons and electron lenses on the
protein and image it directly
The most commonly used technique involves imaging of
many thousands of different single particles preserved in a
thin layer of non-crystalline ice (cryo-EM)
Assuming each image captures the protein in a different
orientation, a computational approach (similar to that used for
CAT scans) will yield a 3D mass density map
With a sufficient number of single particles, the 3D EM maps
can then be interpreted by fitting an atomic model of the
macromolecule into the map
Recent advances in computer power has led to molecular
and atomic detail approaching X-ray crystallography
resolution (for 3D EM); cryo-EM has slightly lower resolution,
showing protein domains and secondary structure
36. 3-Dimensional Electron Microscopy
As with NMR, a main advantage is avoiding the need to
grow crystals
Sample preparation involves preservation in vitreous
ice and then placing in the microscope (cryo-EM)
Used primarily on very large macromolecular structures
where lower resolution is the norm
Combining with X-ray crystallography, NMR, mass
spectrometry, fluorescence resonance energy transfer
and computational techniques provides a way to view
large structures in exquisite detail
38. Cryo-Electron Microscopy Facility
The JEM-3200FS Field Emission
Electron Microscope is equipped
with a field emission electron
gun of 300 kV accelerating
voltage and an in-column energy
filter
Equipment is made by a high-
end speciality equipment
company (JEOL)
Requires full time staff to run and
maintain
39. Cryo-EM Structure of SARS-CoV-2
Spike (S) Protein
(A) Schematic of SARS-CoV-2 S protein primary structure colored by domain. RBD domain (green color)
encodes S protein domain. Arrows denote protease cleavage sites. (B) Side and top views of the prefusion
structure of the SARS-CoV-2 protein with a single RBD in the up conformation. The two RBD down
protomers are shown as cryo-EM density in either white or gray and the RBD up protomer is shown in
ribbons colored corresponding to the schematic in (A).
40. Serial Femtosecond Crystallography
A free electron X-ray laser (XFEL) is used to create
pulses of radiation that are extremely short (lasting only
femtoseconds) and extremely bright
A stream of tiny crystals (nanometers to micrometers in
size) is passed through the beam, and each X-ray
pulse produces a diffraction pattern from a crystal,
often burning it up in the process
A full data set is compiled from as many as tens of
thousands of these individual diffraction patterns
Allows scientists to study molecular processes that
occur over very short time scales, such as the
absorption of light by biological chromophores
41. Growth of Structures in Protein Data
Bank
Year
Number
of
PDB
entries
Total number of X-ray, NMR, electron microscopy and modelled
structures in PDB (yellow bars); blue bar is total number deposited
per year
42. Protein Structure and Drug Discovery
The understanding of the structural and chemical
binding properties of important drug targets in
biologically relevant pathways can provide a unique
advantage in discovering new drugs
Both empirical and
computational methods are
used to design and develop
these drugs
Small molecule synthesis
and testing
Antibody selection
43. Impacting Drug Discovery
Structural Biology is the application of protein structure
technologies (e.g. X-ray crystallography, NMR, CryoEM) in
identifying new drug therapies
This process is known as structure-based drug design
(SBDD)
Chemical Space
Screening of
Chemical Libraries
Biological Space
Finding New Targets
Linked to Disease
44. Importance of Computational Methods
in SBDD
Computational chemistry and biology are critically important in
integrating theory and modelling with experimental observations
This is achieved by using algorithms, statistics and large databases
Simulates physical processes and uses statistics and data analysis
to extract useful information from large bodies of data
Includes genomic and protein networks on the biology side and
chemical/biochemical interactions and biophysical forces on the
chemistry side
Of significant value to the biopharma industry as it helps (1) identify
new disease targets (2) help understand the biology and what is
needed to impact the disease and (3) creates new molecular entities
(small molecule drugs, protein therapeutics, etc) that we can
discover and develop to treat unmet medical need
Combining computational information and guidance with
experimental data helps make the drug discovery process more
efficient
45. Artificial Intelligence (AI) and Machine
Learning (ML) in SBDD
Biology: Target identification
within the protein network?
What is the link to disease?
Experimental: Can I produce a
structure? Can I produce a
chemical library?
Chemistry: Can I optimize my
compound to achieve the
proper potency? Can I
achieve the proper safety and
selectivity?
46. Game Changer: From Primary
Structure to 3D Structure
Deep Mind (UK-based AI company) has developed an algorithm that
can predict the 3-dimensional shape of a protein (i.e. it’s tertiary
structure) from its primary structure (i.e. amino acid sequence)
The algorithm, called AlphaFold, incorporates deep learning in which
the software is trained on large data sets of sequences and structures
to identify patterns that help determine the tertiary structure
Tested AlphaFold in the CASP (critical assessment of protein structure
prediction) competition and was able to predict structures that
matched experimental results
Difficulty of protein structure prediction
Global
distance
test
%
Easy Difficult
AlphFold (2020)
47. Concepts Covered
Protein structure
Primary
Secondary
Tertiary
Quaternary
Protein folding
Protein structure determination
X-ray crystallography
NMR
Electron microscopy
Use of structure to design and develop new drug therapies