2. CONTENTS
Introduction
Supported and funded by
History
PDB Holdings list
Member organizations
Task forces
PDB ID
PDB File format
Browse to WWW.RCSB.ORG/PDB/
3. PROTEIN DATA BANK
PDB
Single worldwide database and hundreds of secondary databases categorize
the data differently.
Key resource in the area of structural biology, stores 3D structural data of
large biological molecules such as Proteins and Nucleic acids.
Data is submitted by Biologists and Biochemists from all around the world
to be freely accessible on internet via its member organizations’ websites
and is updated weekly.
The mission is to maintain a single Protein Data Bank Archive of
Macromolecular Structural data.
4. SUPPORTED AND FUNDED
The Protein Data Bank (PDB) is operated by:
Rutgers, The State University of New Jersey.
The San Diego Supercomputer Center at the University of California, San
Diego.
The Center for Advanced Research in Biotechnology of the National
Institute of Standards and Technology -- the Research Collaboratory for
Structural Bioinformatics (RCSB)
The PDB is supported by funds from the National Science Foundation, the
Department of Energy, and the National Institutes of Health.
5. PDB HISTORY
Two forces to initiate PDB:
Growing collection of sets of protein structural data by X-Ray diffraction.
Brookhaven Raster Display (BRAD), a molecular graphics display to
visualize protein structures in 3D, emerged in 1968.
In 1969, Dr Edger Meyer began to write software to store atomic
coordinates files in a common format to make them available for geometric
and graphical evaluation (with sponsorship of Dr Walton Hamilton at
Bookhaven National Laboratory.
In 1971, one of Dr Meyer’s programs- SEARCH- enabled networking i.e
enabled the researchers to access information from database to study protein
structures offline.
6. In 1973, upon Hamilton’s death, Dr Tom Koetzle took over direction of PDB
for 20 years.
mmCIF project completed and Structural genomics began in 1970s.
In 1980s, IUCr guidelines established, number of structures deposited increases
and independent biological databases established – e.g., the NDB.
In Oct, 1998; PDB was transferred to Research Collaboratory for Structural
Bioinformatics (RCSB), complete transfer since 1999. Dr Helen M Berman of
Rutgers University was the new director.
In 2003, with the formation of wwPDB, the PDB became an international
organization having three member organizations.
!n 2006, the BMRB joined PDB.
7. PDB HOLDINGS LIST ( AS OF 25 OCT, 2011)
Experimental Protein/Nucleic Acid
Proteins Nucleic Acids Other Total
Method complexes
X-ray diffraction 62750 1323 3050 2 67125
NMR 7962 960 179 7 9108
Electron microscopy 262 22 96 0 380
Hybrid 41 3 1 1 46
Other 133 4 5 13 155
Total: 71148 2312 3331 23 76814
8. MEMBER ORGANIZATIONS
Act as Data deposition, Data processing and Distribution centers for PDB data.
Three are founding member organizations:
PDBe…Protein Data Bank in Europe.
PDBj…Protein Data Bank in Japan.
RCSB…Research Collaboratory for Structural Bioinformatics.
The Biological Magnetic Resonance Data Bank (BMRB) joined later in 2006.
Another organization Worldwide Protein Data Bank (wwPDB) oversees PDB.
wwPDB reviews and annotates each submitted entry and then it is automatically
checked for plausibility( the source code for validation software is available.
9.
10.
11.
12.
13.
14. TASK FORCES
X-Ray diffraction (most of the structures)…approximations of the
coordinates of atoms of proteins are obtained. E.g lyzozyme.
NMR (about 15% e.g, haemoglobin)…estimations of distances between
pairs of atoms of proteins. Final conformation is obtained after solving
distance geometry problem.
Cryo Electron Microscopy (very few protein e.g, crsysalin).
15.
16.
17.
18. PDB IDENTIFIER (PDB ID)
Each structure published in PDB receives a four character alphanumeric
identifier or accession number. Like, 1ANG or 4hhb.
However, this cant be used as an identifier for biomolecules. Because
several structures for the same molecule in different environments or
conformations-are contained in PDB with different PDB IDs.
19. PDB FILE FORMAT
Standard data representation…encoded in data dictionary. The metadata
model supporting this representation is used by all PDB data processing and
database software tools.
PDB file format was restricted to 80 characters per line initially.
In 1996, macromolecular Crystallographic Information File (mmCIF) format
started.
In 2005, XML version called as PDBML, was described.
The structure files can be downloaded in any of these three formats.
The files are easily downloaded into graphics packages as well, using web
services.
20. PDB 3D data file format
ASCII
column based: 80 columns per line
KEYWORD for record type at col.#1
Header records
Structure records
Atom records (containing coordinates)
ATOM, HETATM, ..., TER
21. PDB format
Coordinate Section
ATOM record Biopolymer residue atom
HETATM record nonBiopolymer atom
TER record chain terminator
123456789012345678901234567890123456789012345678901234567
89012345678901234567890
ATOM 1 N ALA 1 11.104 6.134 -6.504 1.00 0.00 N
ATOM 2 CA ALA 1 11.639 6.071 -5.147 1.00 0.00 C
...
ATOM 293 1HG GLU 18 -14.861 -4.847 0.361 1.00 0.00 H
ATOM 294 2HG GLU 18 -13.518 -3.769 0.084 1.00 0.00 H
TER 295 GLU 18
HETATM 5555 CA 0.000 0.000 0.000
22. mmCIF
mmCIF is the acronym for the macromolecular Crystallographic
Information File.
mmCIF is based on a subset of the syntax rules for the Self
Defining Text Archive (STAR) file.
A Dictionary Description Language (DDL) defines the structure of
mmCIF dictionaries. Dictionaries provide the metadata which define
the content of mmCIF data files.
mmCIF data files, dictionaries and DDLs are all expressed in a
common syntax.
23. POINT YOUR BROWSER TO:
WWW.RCSB.ORG/PDB/
put either a search term (for example, a protein name) or a
PDB number
36. PROTEIN ANNOTATION
If the contents of the PDB are thought of as primary
data, then there are hundreds of derived (i.e.,
secondary) databases that categorize the data
differently. For example,
both SCOPand CATH categorize structures
according to type of structure and assumed
evolutionary relations; GO categorize structures
based on genes.
37.
38. The Structural Classification
of Proteins (SCOP) database is
a largely manual classification of
protein structural domains based
on similarities of
their structures and amino
acid sequences
39. Class:the overall secondary-structure content of the domain
Architecture:high structural similarity but no evidence
of homology.
Topology:a large-scale grouping of topologies which share
particular structural features
Homologous superfamily:indicative of a demonstrable
evolutionary relationship.
40.
41. Pfam is a
database
of protein
families that
includes their
annotations
and multiple
sequence
alignment
generated
using hidden
Markov models
57. VIEWING THE DATA
56,523 structures In PDB have structure factor files.
6410 structures In PDB have NMR restraint files.
198 structures In PDB have chemical shifts files.
Text file can be viewed or modified in editor.
Structure files may be viewed using various free and commercial
visualizations programs and Web browsers plug-ins like
OPEN SOURCE PDB softweres
Jmol
Molekel
MeshLab(able to import PDB data set and buildup surfaces from them)
QuteMol
Avogadro
And others open but not free , like
PYMOL , RASMOL, VIST PROT 3DS & STAR BIOCHEM
58. HTTP://WWW.RCSB.ORG/PDB/STATIC.DO
?P=SOFTWARE/SOFTWARE_LINKS/MOL
ECULAR_GRAPHICS.HTML
The RCSB PDB website contains
an extensive list of both free and
commercial molecule visualization
programs and web browser plug-in.
59.
60.
61. LIMITATION
The Protein Data Bank (PDB) is the central archive of
experimentally solved biomolecular structures. However, the PDB
only allows data retrieval and does not provide functionality for
collaboration or user feedback.
In contrast, PDBWiki allows for sharing expert knowledge about
structures deposited in the PDB. It provides tools for discussing and
annotating proteins in a collaborative way. The goal is to create a
central and freely-accessible repository of user-contributed
information that will be useful for anyone working with PDB
structures. As such PDBWiki can be considered a part of a wider
effort in community-based biological databases curation.