Structural database and their classification by abdul qahar

Presenting By Abdul Qahar (A Q)
Buner Campus
Edited, Prepared and shared By
Abdul Qahar

Structural database and their
classification.

Basic concept about Database
1. What is a database?
A database is a collection of data which can be used:
• alone, or
• combined / related to other data
to provide answers to the user’s question.

Data types
primary data
secondary data
tertiary data
sequence
DNA
amino acid
DMPVERILEALAVE…
primary database
secondary protein
structure“motifs”: regular
expressions, blocks, profiles,
fingerprints
e. g., alpha-helices, beta-
strands
secondary db
domains, folding units
tertiary protein structure tertiary db
atomic co-ordinates
interaction data
binary protein-protein
interactions/ networks
pathways and
functional networks
interaction db

Primary biological databases
Nucleic acid databases
EMBL
GenBank
DDBJ (DNA Data Bank of
Japan)
Protein databases
PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D

Nucleotide Databases
•EMBL:Nucleotide sequence database
•Ensembl: Automatics annotation of eukaryotic genomes
•Genome Server: Overview of completed genomes at EBI
•Genome-MOT: Genome monitoring table
•EMBL-Align: Multiple sequence alignment database

Sequence data = strings of
letters
Nucleotides (bases)
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
triplet codons
genetic code
20 amino acids
(A, L, V, S etc.)

Three-dimensional protein structure =
atomic coordinates in 3D space

EMBL/GenBank/DDJB
• These 3 db contain mainly the same information (few differences
in the format and syntax)
• Serve as archives containing all sequences (single genes, ESTs,
complete genomes, etc.) derived from:
– Genome projects and sequencing centers
– Individual scientists
– Patent offices (i.e. USPTO, EPO)
• Non-confidential data are exchanged daily.

Databases related to Genomics
• Contain information on genes, gene location (mapping),
gene nomenclature and links to sequence databases;
• Exist for most organisms important for life science research;
• Examples: MIM, GDB (human), MGD (mouse), FlyBase
(Drosophila), SGD (yeast), MaizeDB (maize), SubtiList
(B.subtilis), etc.

Swiss-Prot
• Annotated protein sequence database established in 1986 and
maintained collaboratively since 1987, by the Department of
Medical Biochemistry of the University of Geneva and EBI
• Complete, Curated, Non-redundant and cross-referenced with 34
other databases
• Highly cross-referenced
• Available from a variety of servers and through sequence analysis
software tools
• More than 8,000 different species
• First 20 species represent about 42% of all sequences in the
database
• More than 1,29,000 entries with 4.7 X 1010 amino acids

PDB: Protein Data Bank
• Holds 3D models of biological macromolecules (protein, RNA,
DNA).
• All data are available to the public.
• Obtained by X-Ray crystallography (84%) or NMR
spectroscopy (16%).
• Submitted by biologists and biochemists from around the
world.

EMBL Nucleotide Sequence
Database
• An annotated collection of all publicly available nucleotide
and protein sequences
• Created in 1980 at the European Molecular Biology
Laboratory in Heidelberg.
• Maintained since 1994 by EBI- Cambridge.

DDBJ–DNA Data Bank of
Japan
• An annotated collection of all publicly available
nucleotide and protein sequences
• Started, 1984 at the National Institute of Genetics (NIG)
in Mishima.
• Still maintained in this institute a team led by Takashi
Gojobori.

Why Proteins Structure ?
Proteins are fundamental components of all living
cells, performing a variety of biological tasks.
Each protein has a particular 3D structure that determines its
function.
Protein structure is more conserved than protein sequence, and
more closely related to function.

Supersecondary structures
Assembly of secondary structures which are
shared by many structures.
Beta hairpin
Beta-alpha-beta unit
Helix hairpin

Structural Databases
SCOP: Structural Classification of Proteins
Current Release: 686 folds; 1073 Superfamilies; 1827 Familes
representing 15,979 PDB entries
CATH: Classification, Architecture, Topology, Homology

Levels in SCOP
1. Class
2. Folds
3. Super families
4. Families

Major classes in scop
• Classes
– All alpha proteins
– Alpha and beta proteins (a/b)
– Alpha and beta proteins (a+b)
– Multi-domain proteins
– Membrane and cell surface proteins
– Small proteins

Folds*
• Each Class may be divided into one or more folds
• Proteins which have the same secondary structure elements
arranged the in the same order in the protein chain and in three
dimensions are classified as having the same fold

Superfamilies
• Superfamilies are a subdivisions of folds
• A superfamily contains proteins which are thought to be
evolutionarily related due to
– Sequence
– Function
– Special structural features
• Relationships between members of a superfamily may not be
readily recognizable from the sequence alone

Families
• Subdivision of super families
• Contains members whose relationship is readily recognizable
from the sequence
• Families are further subdivided in to Proteins
• Proteins are divided into Species
– The same protein may be found in several species

All beta: Immunoglobulin
(8fab)
OL

OL
Alpha/beta: Triosephosphate
isomerase

CATH
• Levels
• Class
• Architecture
– This level is unique to CATH
• Topology
– ~Fold(/super family) in SCOP
• Homologous Super family
– ~Super family(/family) in SCOP

Architecture
• Same overall arrangement of secondary structures
– Example: The architecture :Two layer beta sheet proteins
contains different folds each with a distinct number and
connectivity of strands

Abdul Qahar Buneri abdulqahar045@gmail.com
www.slideshare.net/abdulqahar045

Structural database and their classification by abdul qahar

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (13)

Similar a Structural database and their classification by abdul qahar

Similar a Structural database and their classification by abdul qahar (20)

Más de Abdul Qahar {{Abdul Wali Khan University Mardan}} (Buner Campus)

Más de Abdul Qahar {{Abdul Wali Khan University Mardan}} (Buner Campus) (20)

Último

Último (20)

Structural database and their classification by abdul qahar