SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
BIOINFORMATICS FOR BETTER TOMORROW
B. Jayaram, N. Latha, Pooja Narang, Pankaj Sharma, Surojit Bose, Tarun Jain, Kumkum
Bhushan, Saher Afshan Shaikh, Poonam Singhal, Gandhimathi and Vidhu Pandey
Department of Chemistry &
Supercomputing Facility for Bioinformatics & Computational Biology,
Indian Institute of Technology, Hauz Khas, New Delhi - 110016, India.
Email: bjayaram@chemistry.iitd.ac.in
Web site: www.scfbio-iitd.org
I. What is Bioinformatics?
Bioinformatics is an emerging interdisciplinary area of Science & Technology
encompassing a systematic development and application of IT solutions to handle
biological information by addressing biological data collection and warehousing, data
mining, database searches, analyses and interpretation, modeling and product design.
Being an interface between modern biology and informatics it involves discovery,
development and implementation of computational algorithms and software tools that
facilitate an understanding of the biological processes with the goal to serve primarily
agriculture and healthcare sectors with several spin-offs. In a developing country like
India, bioinformatics has a key role to play in areas like agriculture where it can be used
for increasing the nutritional content, increasing the volume of the agricultural produce
and implanting disease resistance etc.. In the pharmaceutical sector, it can be used to
reduce the time and cost involved in drug discovery process particularly for third world
diseases, to custom design drugs and to develop personalized medicine (Fig. 1).
Gene finding Protein structure prediction Drug design
Figure 1 : Some major areas of research in bioinformatics and computational biology.
Figure 2 : The tree of life depicting evolutionary relationships among organisms from the
major biological kingdoms. A possible evolutionary path from a common ancestral cell to
the diverse species present in the modern world can be deduced from DNA sequence
analysis. The branches of the evolutionary tree show paths of descent. The length of paths
does not indicate the passage of time and the vertical axis shows only major categories of
organisms, not evolutionary age. Dotted lines indicate the supposed incorporation of
some cell types into others, transferring all of their genes and giving the tree some web-
like features. (Source : A Alberts, D Bray, J Lewis, M Raff, K Roberts & J D Watson, Molecular
Biology of the Cell, p38, Garland, New York (1994)).
It is assumed that life originated from a common ancestor and all the higher
organisms evolved form a common unicellular prokaryotic organism. Subsequent
division of different forms of life from this makes the diversity in the morphological and
genetic characters (Fig. 2).
Figure 3 : (A) An animal cell. The figure represents a rat liver cell, a typical higher
animal cell in which features of animal cells are evident such as nucleus, nucleolus,
mitochondria, Golgi bodies, lysosomes and endoplasmic reticulum (ER). (Source:
www.probes.com/handbook/ figures/0908.html) (B). A plant cell (cell in the leaf of a higher
plant). Plant cells in addition to plasma membrane have another layer called cell wall,
which is made up of cellulose and other polymers where as animal cells have plasma
membrane only. The cell wall, membrane, nucleus chloroplasts, mitochondria, vacuole,
ER and other organelles that make up a plant cell are featured in the figure (Source:
http://www.sparknotes.com/biology/cellstructure/celldifferences/section1.html).
(A). Animal cell (B). Plant cell
The common basis to all these diverse organisms is the basic unit known as the
cell (Fig. 3). All cells whether they belong to a simple unicellular organism or a complex
multicellular organism (human adults comprise ~ 30 trillion cells), possess a nucleus
which carries the genetic material consisting of polymeric chains of DNA (deoxyribo
nucleic acid), holding the hereditary information and controlling the functioning. Several
challenges lie ahead in deciphering how DNA, the genetic material in these cells
eventually leads to the formation of organisms (Fig. 4).
CELL
TISSUE
ORGAN
ORGANISM
Figure 4 : Levels of organization. The entire DNA content in a cell is called the genome.
In spite of the complex organization, cells of all organisms possess different
The entire protein content in a cell is called the proteome. Cellome is the entire
complement of molecules, including genome and proteome within a cell. Tissues are
made of collections of cells. Tissue collections make organs. An organism is a collection
of several organ systems.
molecules of life for the maintenance of living state. Theses molecules include nucleic
acids, proteins, carbohydrates and lipids (Fig. 5).
Nucleic Acids
Carbohydrates
Proteins
Lipids
Figure 5 : Biom
All organisms self re genetic material DNA, the
olecules of life
plicate due to the presence of
polynucleotide consisting of four bases Adenine (A), Thymine (T), Guanine (G) and
Cytosine (C) (Fig. 6). The entire DNA content of the cell is what is known as the genome.
The segment of genome that is transcribed into RNA is called gene. So we can say that
hereditary information is transferred in the form of genes contained on the four bases.
Understanding these genes is one of the modern day challenges. Why only five percent of
the entire DNA is in the form of genes and what is the rest of the DNA responsible for,
under what conditions genes are expressed, where, when and how to regulate gene
expression are some unsolved puzzles.
Figure 6 : DNA and its alphabets A, G, C & T the nucleic acid bases. (Source: Molecular
rmation present in DNA is expressed via RNA molecules into proteins
biology of the cell, Garland Publishing. Inc. New York, 1994 )
The info
which are responsible for carrying out various activities. This information flow is called
the central dogma of molecular biology (Fig. 7). Potential drugs can bind to DNA, RNA
or proteins to suppress or enhance the action at any stage in the pathway.
Figure 7 : Central dogma of modern biology
enome Analysis
f genome called genes determine the sequence of amino acids in
otein
n order to identify all
average of just 2% i.e. just about 160 enzymes.
G
Segments o
pr s. The mechanism is simple for the prokaryotic cell where all the genes are
converted into the corresponding mRNA (messenger ribonucleic acid) and then into
proteins. The process is more complex for eukaryotic cells where rather than full DNA
sequence, some parts of genes called exons are expressed in the form of mRNA
interrupted at places by random DNA sequences called introns. Of the several questions
posed here, one is that how some parts of the genome are expressed as proteins and yet
other parts (introns as well as intergenic regions) are not expressed.
Scores of genome projects are being carried out world wide i
the genes and ascertain their functions in a specified organism. Human genome project
is one such global effort to identify all the alphabets on the human genome, initiated in
1990 by the US government. A comparison of the genome sizes of different organisms
(Table 1) raises questions like what types of genetic modifications are responsible for the
four times large genome size of wheat plant and seven times small size of the rice plant
as compared to that of humans. Mice and humans contain roughly the same number of
genes – about 28K protein coding regions. The chimp and human genomes vary by an
RNA
Genome
Gene = DNA
DNA binding drugs Protein
Gene therapy Primary Sequence
RPRTAFSSEQLARLK
REFNENRYLTERRR
QQLSSELGLNEAQIK
IWFQNKRAKIKKS
RNA binding drugs Drugs
Inhibitors/activators
Table 1 : Genome sizes of some organisms
Organism Genome size (Mb)
(Mb=Mega base)
Eschericia coli 4.64
M t suberculosi 4.4
H.Influenza 1.83
Homo sapiens (humans) 3300
Mouse 3000
Rice 430
Wheat 13500
(source : http://users m/jkimball.ma.ultranet/BiologyPages nomeSizes.html)
Table 2 : Specific genetic disorders
.rcn.co /G/Ge
Genetic Disorder Reason
Huntington’s Disease Excessive repeats of a three-base sequence, "CAG" on
chromosome 4.
Parkinson’s Disease Variations in genes on chromosomes 4,6.
Sickle Cell Disease Mutation in hemoglobin-b gene on chromosome 11
Tay-Sachs Disease Co 15ntrolled by a pair of genes on chromosome
Cystic Fibrosis Mutations in a single (CFTR) gene
Breast Cancer Mutation on genes found on chromosomes 13 & 17
Leukemia Exchange arms ofof genetic material between the long
chromosome 6 & 22.
Colon cancer Proteins MSH2, MSH6 on chromosome 2 & MLH1 on
chromosome 3 are mutated.
Asthma Disfunctioning of genes on chromosome 5,6,11,14&12.
Rett Syndrome Disfunctioning of a gene on the X chromosome.
Bruk omaitt lymph Translocations on chromosome 8
Alzheimer disease Mutat 9 &ions on four genes located on chromosome 1, 14, 1
21.
Werner Syndrome Mutat me 8.ions on genes located on chromoso
Angelman Syndrome Deletion of a segment on maternally derived chromosome 15.
Best disease Mutat e 11.ion in one copy of a gene located on chromosom
Pendred Syndrome Defective gene on chromosome 7.
Dias siatrophic dyspla Mutation in a gene on chromosome 5
(Source:ht h.gov/)tp://www.ncbi.nlm.ni
Several genetic disorders like Huntington’s disease, Parkinson’s disease, sickle
cell anemia etc. are caused due to mutations in the genes or a set of genes inherited from
one generation to another (Table 2). There is a need to understand the cause for such
disorders. Why nature carries such disorders and how to prevent these are some of the
areas where extensive research endeavor has to be invested.
An understanding of the genome organization can lead to concomitant progresses
in drug-target identification. Comparative genomics has become a very important
emerging branch with tremendous scope, for the above mentioned reasons. If the genome
for humans and a pathogen, a virus causing harm is identified, comparative genomics can
predict possible drug-targets for the invader without causing side effects to humans (Fig.
8).
Potential Areas for Drug Targets = Hc
∩ I
H = Human Genome / Proteome (Healthy Individual)
I = Genome / Proteome of the Invader / Pathogen
Figure 8 : Comparative genomics for drug target identification – an application of
Bioinformatics
Over the past two decades genetic modification has enabled plant breeders to
develop new varieties of crops like cereals, soya, maize at a faster rate. Some of these
called as transgenic varieties have been engineered to possess special characteristics that
make them better. Recently efforts are on in the area of utilizing GM (genetically
modified) crops to produce therapeutic plants.
Another area where bioinformatics techniques can play an important role is in
SNP (Single nucleotide polymorphisms) discovery and analysis. SNPs are common DNA
sequence variations that occur when a single nucleotide in the genome sequence is
changed. SNP’s occur every 100 to 300 bases along the human genome. The SNP
variants promise to significantly advance our ability to understand and treat human
diseases.
Given the whole genome of an organism, finding the genes (gene annotation) is a
challenging task. Various approaches have been utilized by different groups for this.
These approaches are typically database driven which rely on the already known
forma
r which attempts to capture forces responsible for DNA structure and protein-DNA
interactions walks along the length of the genome distinguishing genes from non-genes.
As of now, the physico-chemical model (ChemGene 1.0) is able to distinguish
successfully genes from non-genes in 120 prokaryotic genomes with fairly good
specificities and sensitivities.
Protein Folding
Proteins are polymers of amino acids with unlimited potential variety of structures
and metabolic activiti e dimensional shape
called its confo its shape and
genetic code. Substitution of a single amino acid can cause a major
tist made some significant contributions
energy which is about 10 to 15 kcal/mol relative to the unfolded state. How does a given
sequence of amino acids fold into a specific conformation as soon as it is conceived on
in tion. The Markov model based methods utilize short range correlations between
bases along the genome. Other methods based on Fourier transform techniques
emphasize global correlations.
At IIT Delhi, we are attempting to develop a hypothesis driven physico-chemical
model for genome analysis and for distinguishing genes from non-genes. In this model, a
vecto
es. Each protein possesses a characteristic thre
rmation. Sequence of amino acids of protein determines
function. This in turn is genetically controlled by the sequences of bases in DNA of the
cell through the
alteration in function. Study of homologous proteins from different species is of interest
in constructing taxonomic relationships. Proteins may be classified into structural
proteins, enzymes, hormones, transport proteins, protective proteins, contractile proteins,
storage proteins, toxins etc..
Prof G N Ramachandran, an Indian scien
towards an understanding of the secondary structure of proteins. His fundamental work in
this area is remembered in the form of Ramachandran maps.
Protein folding can be considered to involve changes in the polypeptide chain
conformation to attain a stable conformation corresponding to the global minimum in free
the ribosomal machinery using the information on mRNA in millisecond timescales, is
the problem pending a resolution for over five decades. For a 200 amino acid protein with
just two conformations per amino acid, a systematic search for this minimum among all
possible 2200
conformations, will take approximately 3x1054
years which is much longer
than the present age of the universe. Despite this innumerable number of conformations
to search, nature does it in milliseconds to seconds (Fig. 9).
Figure 9 : The protein folding problem (Source: press2.nci.nih.gov/.../ snps_cancer21.htm)
Solution to the protein folding problem has an immense immediate impact on
society. In biotech industry, this can be helpful in the design of nanobiomachines and
sed drug discovery wherein
the stru
biocatalysts to carry out the required function. Pulp, paper and textile industry, food and
beverages industry, leather and detergent industry are among the several potential
beneficiaries. Other important implications are in structure ba
cture of the drug target is vital (Fig. 10). Structures of receptors – a major of class
of drug targets - are refractory to experimental techniques thus leaving the fold open to
computer modeling.
Figure 10 : Drug targets (Source: Drews J., Drug Discovery: A historical perspective, Science, 2000,
287: 1960-1964)
There is an urgent demand for faster and better algorithms for protein structure
prediction. There are three major ways in which protein structure prediction attempts are
currently progressing viz. homology modeling, fold recognition and ab initio approaches.
Whereas the first two approaches are database dependent relying on known structures and
folds of proteins, the third one is independent of the databases and starts from the
physical principles. Although the ab initio techniques till date are able to predict only
small proteins, because of their first principle approach, they have the potential to predict
new / novel folds and structures. On account of the complexity of the problem,
computational requirement for such a prediction is a major issue which can easily be
appreciated by the estimates of time required to fold a 200 amino acid protein which
evolves ~10-11
sec per day per processor according to Newton’s laws of motion. This will
require approximately a million years to fold a single protein. If one can envision a
million proc tein can be folded in one ear computer
e. In this Blue Gene
essors working together, a single pro y
tim spirit, the IBM has launched a five year project in the year 1999
to address the complex biomolecular phenomena such as protein folding.
At IIT Delhi, we are developing a computational protocol for modeling and
predicting protein structures of small globular proteins. Here a combination of
bioinformatics tools, physicochemical properties of proteins and ab initio approaches are
used. Starting with the sequence of amino acids, for ten small alpha helical proteins,
structures to within 3-5Å of the native are predicted within a day on a 50 ultra sparc III
Cu 900 MHz processor cluster. Attempts are on to further bring the structures to within
<3Å of the native structures via molecular dynamics and Metropolis Monte Carlo
simulations.
Drug Design
As structures of more and more protein targets become available through
crystallography, NMR and bioinformatics methods, there is an increasing demand for
computational tools that can identify and analyze active sites and suggest potential drug
molecules that can bind to these sites specifically (Fig. 11).
Figure 11 : Active-site directed drug-design
Also to combat life-threatening diseases such as AIDS, Tuberculosis, Malaria etc.,
a global push is essential (Table 3). Millions for Viagra and pennies for the diseases of
the poor is the current situation of investment in Pharma R&D.
Active Site
Table 3 : WHO Calls for Global Push Against AIDS & Tuberculosis & Malaria
(Source : www.globalhealth.org, www.thenation.com )
Time and cost required for designing a new drug are immense and at an
unacceptable level. According to some estimates it costs about $880 million and 14 years
of research to develop a new drug before it is introduced in the m
Lead Generation & Lead Optimization
3.0 yrs 15%
Preclinical Development
1.0 yrs 10%
Phase I, II & III Clinical Trials
6.0 yrs 68%
FDA Review & Approval
1.5 yrs 3%
Drug to the Market
14 yrs $880million
arket (Table 4).
Table 4 : Cost and time involved in drug discovery
Target Discovery
2.5 yrs 4%
(Source : PAREXEL, PAREXEL’s Pharmaceutical R&D Stastical Sourcebook, 2001, p96)
illion die each year due to diseases like diarrhoea, small pox, polio,
river blindness, leprosy.
cost $ 12 billion to fight the disease of poverty
(AIDS medication about $15K per annum)
Millions for Viagra, Pennies for the Diseases of the Poor
Of all new medications brought to the market (1223) by Multinationals from
1975 only 1% (13) are for tropical diseases plaguing the third world.
Life style drugs dominate Pharma R&D
(1) Toe nail Fungus (2) Obesity
(3) Baldness (4) Face Wrinkle
(5) Erectile Dysfunction (6) Separation anxiety of dogs etc.
Nearly 6 m
Estimated
A new economic analysis
Healthy people can get themselves out of poverty.
Relieving a population of burden of the diseases for 15 to 20 years will give a
huge boost to economic development.
Nearly 6 million die each year due to diseases like diarrhoea, small pox, polio,
river blindness, leprosy.
cost $ 12 billion to fight the disease of poverty
(AIDS medication about $15K per annum)
Millions for Viagra, Pennies for the Diseases of the Poor
Of all new medications brought to the market (1223) by Multinationals from
1975 only 1% (13) are for tropical diseases plaguing the third world.
Life style drugs dominate Pharma R&D
(1) Toe nail Fungus (2) Obesity
(3) Baldness (4) Face Wrinkle
(5) Erectile Dysfunction (6) Separation anxiety of dogs etc.
Estimated
A new economic analysis
Healthy people can get themselves out of poverty.
Relieving a population of burden of the diseases for 15 to 20 years will give a
huge boost to economic development.
Intervention of computers at some plausible steps is imperative to bring down the cost
an
Fig re 12
E
for a given drug
d time required in the drug discovery process (Fig. 12, Table 5).
u : Potential areas for In silico intervention in drug-discovery process
Table 5 : High End Computing Needs for In Silico Drug Design
stimates of current computational requirements to complete a binding affinity calculation
(Source : http://cbcg.lbl.gov)
In silico methods can help in identifying drug targets via bioinformatics tools.
They can also b g / active sites,
generate candidate m
Modeling complexity Method Size of Required computing
library time
Molecular Mechanics SPECITOPE 140,000 ~1 hour
Rigid ligand/target LUDI 30,000 1-4 hours
CLIX 30,000 33 hours
Molecular Mechanics Hammerhead 80,000 3-4 days
Partially flexible ligand DOCK 17,000 3-4 days
Rigid target DOCK 53,000 14 days
e used to analyze the target structures for possible bindin
olecules, check for their drug-likeness, dock these molecules with
Molecular Mechanics
Fully flexible ligand
Rigid target
ICM 100,000 ~1 year
(extrapolated)
Molecular Mechanics
Free energy
perturbation
AMBER
CHARMM
1 ~several days
QM Active site and
MM protein
Gaussian,
Q-Chem
1 >several weeks
the target, rank them according to their binding affinities, further optimize the molecules
to improve binding characteristics etc..
-50
0
-30
-20
-10
0
10
20
30
BindingFreeEnergy(kcal/mol)
Pursuing the dream that once the gene target is identified and validated drug
discovery protocols could be automated using Bioinformatics & Computational Biology
tools, at IIT Delhi we have developed a computational protocol for active site directed
drug design. The suite of programs (christened “Sanjeevini”) has the potential to evaluate
and /or generate lead-like molecules for any biological target. The various modules of
this suite are designed to ensure reliability and generality. The software is currently being
optimized on Linux and Sun clusters for faster and better results. Making a drug is more
like designing an adaptable key for a dynamic lock. The Sanjeevini methodology consists
of design of a library of templates, generation of candidate inhibitors, screening
candidates via drug-like filters, parameter derivation via quantum mechanical
calculations for energy evaluations, Monte Carlo docking and binding affinity estimates
e protocols
tested on Cyclooxygenase-2 as a target could successfully distinguish NSAIDs
(Nonsteroid on other
(red).
based on post facto analyses of all atom molecular dynamics trajectories. Th
al Anti-inflammatory drugs) from non-drugs (Fig 13). Validation
targets is in progress.
-4
Figure 13 : The active site directed lead design protocols developed from first principles
and implemented on a supercomputer could segregate NSAIDS (blue) from Non-drugs
Bioinformatics and Biodiversity
Biodiversity informatics harnesses the power of computational and information
technologies to organize and analyze data on plants and animals at the macro and at
genome levels. India ranks among the top twelve nations of the world in terms of
biological diversity (Fig. 14 and Fig15).
Himalayas - This majestic range of
mountains is the home of a diverse range
of flora and fauna. Eastern Himalayas is one
iversity hotspots in India.
Chilika - This wetland area is protected
under the Ramsar convention.
of the two biod
Sunderbans - The largest mangrove forest in Western Ghats - One of the two
India. biodiversity hotspots in India.
Thar desert ntrast to the Himalayan region.
(Source: http://edugreen.teri.res.in/explore/maps/biodivin.htm)
- The climate and vegetation in this area is a co
Figure 14 : Biodiversity in India
Figure 15: Biodiversity bioinformatics is essential to preserve the natural balance of flora
and fauna on the planet and to prevent extinction of species (Source: www.hku.hk/
ecology/envsci.htm).
Bioinformatics endeavors in India
Owing to the well acknowledged IT skills and a spate of upcoming software,
biotech and pharma industries and active support from Government organizations, the
field of Bioinformatics appears promising. The projections of the growth potential in
India in a global scenario however belie these expectations (Fig. 16).
Figure 16 : Growth potential for Bioinformatics based business opportunities in India
ccording to IDC (International Data Corporation, India.)a
0
0.2
0.8
1
1.2
1.4
1.6
1.8
1
0.4
0.6
2
Year2001 2005
Global
India
Creating the necessary infrastructure around specialists to meet the high
omputational power required for these activities (Table 6, Fig. 17) is one step in the
ght direction.
able 6 : Geno ate of computational
quirements
. Gene Prediction
c
ri
T me to drug discovery research: A rough estim
re
1
Homology/string comparison. 300 Giga flop
~ 3*109
bp
2. Protein Structure Prediction
- Threading 100 Giga flop
- Statistical Models
- Filters to reduce plausible structures
Molecular Dynamics
s per simulation 25000
Active site directed drug design
100 structures 30 Peta flop
1-ns simulation for structure refinement
Total Compute Time 5000ns
Number of atom
3.
Scan 1000 drug molecules/protein 18 Peta flop
Total Computational requirement to design one lead compound from genome
16
o design ten lead compounds per day (on a dedicated machine)
e requirement is 5.8 tera flops capacity.
ut of every 100 lead compounds, only one may become a drug, which further increases
e computer requirements)
3ns simulation per drug molecule
(Active site searches, docking, rate and affinity determinations etc.)
Total Compute Time 3000ns
25000 atoms per simulation
Summary
~ 50 Peta flop (5.x10 floating point operations)
T
th
(O
th
for bioinformatics
and scientific community from across the country,
l Private
iotechnology, Govt of India). It is envisioned that the
span all the 60+ Bioinformatics Centres of the Department of Biotechnology
e 18).
Figure 17 : Challenges
To enable access by the student
the Supercomputing Facility at IIT Delhi is connected to Biogrid-India (a Virtua
Network of the Department of B
Biogrid will
with GBPS bandwidth and at least 10 teraflops of compute capacity soon (Figur
Transcription
Model
Education RNA splicing
model
Signal
transduction
pathways
Protein:DNA A/RN
protein:protein
Protein structure
prediction
Drug Design
Protein
Evolution
Gene ontologies
Speciation
Challenges for
Bioinformatics
Figure 18 : Biogrid-India
Pro-active policies and conducive environment to researchers, entrepreneurs and
Industry will go a long way in steering India towards leadership in the area of
Bioinformatics and Computational Biology.
Acknowledgements.
BJ is thankful to the Department of Biotechnology, the Department of Science &
Technology, Council of Scientific & Industrial Research, Indo-French Centre for the
Promotion of Advanced Research and IIT Delhi administration for support over the years.

Más contenido relacionado

La actualidad más candente

Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)
Zohaib HUSSAIN
 
3 genetics syllabus statements
3 genetics syllabus statements3 genetics syllabus statements
3 genetics syllabus statements
cartlidge
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
Eli Rosenthal
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
hemantbreeder
 

La actualidad más candente (20)

Human genome project
Human genome projectHuman genome project
Human genome project
 
Mitochondrial dna
Mitochondrial dnaMitochondrial dna
Mitochondrial dna
 
Comparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesComparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organelles
 
Human Genome
Human Genome Human Genome
Human Genome
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Molecular evolution
Molecular evolutionMolecular evolution
Molecular evolution
 
MITOCHONDRIAL GENETICS
MITOCHONDRIAL GENETICSMITOCHONDRIAL GENETICS
MITOCHONDRIAL GENETICS
 
Hgp
HgpHgp
Hgp
 
Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)Nuclear Genomes(Short Answers and questions)
Nuclear Genomes(Short Answers and questions)
 
3 genetics syllabus statements
3 genetics syllabus statements3 genetics syllabus statements
3 genetics syllabus statements
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
Human genome
Human genomeHuman genome
Human genome
 
Genetics and genomic
Genetics and genomicGenetics and genomic
Genetics and genomic
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1Comparative Genomics and Visualisation - Part 1
Comparative Genomics and Visualisation - Part 1
 
10.2 inheritance
10.2 inheritance10.2 inheritance
10.2 inheritance
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
 
Extra nuclear genome- mitchondrial dna
Extra nuclear genome- mitchondrial dnaExtra nuclear genome- mitchondrial dna
Extra nuclear genome- mitchondrial dna
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 

Destacado

Destacado (12)

trial
trialtrial
trial
 
Motibazio teoriak
Motibazio teoriakMotibazio teoriak
Motibazio teoriak
 
اسماء
اسماءاسماء
اسماء
 
Truama management only for Community level
Truama management only for Community levelTruama management only for Community level
Truama management only for Community level
 
Web 2
Web 2Web 2
Web 2
 
Snake bite
Snake biteSnake bite
Snake bite
 
MBO
MBOMBO
MBO
 
кардинал ришелье
кардинал ришельекардинал ришелье
кардинал ришелье
 
AranasPJD_CV
AranasPJD_CVAranasPJD_CV
AranasPJD_CV
 
My work for r.elghamry s.s 2015 collection
My work for r.elghamry s.s 2015 collectionMy work for r.elghamry s.s 2015 collection
My work for r.elghamry s.s 2015 collection
 
SOCIAL MEDIA MARKETING - CONNECTING THE WORLD
SOCIAL MEDIA MARKETING - CONNECTING THE WORLDSOCIAL MEDIA MARKETING - CONNECTING THE WORLD
SOCIAL MEDIA MARKETING - CONNECTING THE WORLD
 
Philadelphia Folk Festival 2015
Philadelphia Folk Festival 2015Philadelphia Folk Festival 2015
Philadelphia Folk Festival 2015
 

Similar a Bio

Tree Ganatic And Iprovment Ppt C1 &2.pptx
Tree Ganatic And Iprovment Ppt C1 &2.pptxTree Ganatic And Iprovment Ppt C1 &2.pptx
Tree Ganatic And Iprovment Ppt C1 &2.pptx
KemalDesalegn
 
Current Trends in Molecular Biology and Biotechnology
Current Trends in Molecular Biology and BiotechnologyCurrent Trends in Molecular Biology and Biotechnology
Current Trends in Molecular Biology and Biotechnology
Perez Eric
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
rehman2009
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)
ajay vishwakrma
 
YASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptx
YASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptxYASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptx
YASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptx
YashPandya81
 

Similar a Bio (20)

Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Project
 
Genomics
GenomicsGenomics
Genomics
 
Genetic fine structure
Genetic fine structureGenetic fine structure
Genetic fine structure
 
Tree Ganatic And Iprovment Ppt C1 &2.pptx
Tree Ganatic And Iprovment Ppt C1 &2.pptxTree Ganatic And Iprovment Ppt C1 &2.pptx
Tree Ganatic And Iprovment Ppt C1 &2.pptx
 
DNA Methylation & C Value.pdf
DNA Methylation & C Value.pdfDNA Methylation & C Value.pdf
DNA Methylation & C Value.pdf
 
Current Trends in Molecular Biology and Biotechnology
Current Trends in Molecular Biology and BiotechnologyCurrent Trends in Molecular Biology and Biotechnology
Current Trends in Molecular Biology and Biotechnology
 
The Human Genome Project
The Human Genome Project The Human Genome Project
The Human Genome Project
 
Impact of the human genome project on medical advancement in India.
Impact of the human genome project on medical advancement in India.Impact of the human genome project on medical advancement in India.
Impact of the human genome project on medical advancement in India.
 
PAPER 3.1 ~ HUMAN GENOME PROJECT
PAPER 3.1 ~  HUMAN GENOME PROJECTPAPER 3.1 ~  HUMAN GENOME PROJECT
PAPER 3.1 ~ HUMAN GENOME PROJECT
 
Epigeneticsand methylation
Epigeneticsand methylationEpigeneticsand methylation
Epigeneticsand methylation
 
Human genome project by kk sahu
Human genome project by kk sahuHuman genome project by kk sahu
Human genome project by kk sahu
 
Genomics
GenomicsGenomics
Genomics
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
An Introduction to Genomics
An Introduction to GenomicsAn Introduction to Genomics
An Introduction to Genomics
 
Human genome project(ibri)
Human genome project(ibri)Human genome project(ibri)
Human genome project(ibri)
 
Bioinformatics manual
Bioinformatics manualBioinformatics manual
Bioinformatics manual
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Human genome project (2) converted
Human genome project (2) convertedHuman genome project (2) converted
Human genome project (2) converted
 
Genetics in psychobiology
Genetics in psychobiologyGenetics in psychobiology
Genetics in psychobiology
 
YASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptx
YASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptxYASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptx
YASH PANDYA BIOLOGY INVESTIGATORY PROJECT.pptx
 

Último

🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
mahaiklolahd
 

Último (20)

Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
 
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
Saket * Call Girls in Delhi - Phone 9711199012 Escorts Service at 6k to 50k a...
 
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
 
Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510
 

Bio

  • 1. BIOINFORMATICS FOR BETTER TOMORROW B. Jayaram, N. Latha, Pooja Narang, Pankaj Sharma, Surojit Bose, Tarun Jain, Kumkum Bhushan, Saher Afshan Shaikh, Poonam Singhal, Gandhimathi and Vidhu Pandey Department of Chemistry & Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi - 110016, India. Email: bjayaram@chemistry.iitd.ac.in Web site: www.scfbio-iitd.org I. What is Bioinformatics? Bioinformatics is an emerging interdisciplinary area of Science & Technology encompassing a systematic development and application of IT solutions to handle biological information by addressing biological data collection and warehousing, data mining, database searches, analyses and interpretation, modeling and product design. Being an interface between modern biology and informatics it involves discovery, development and implementation of computational algorithms and software tools that facilitate an understanding of the biological processes with the goal to serve primarily agriculture and healthcare sectors with several spin-offs. In a developing country like India, bioinformatics has a key role to play in areas like agriculture where it can be used for increasing the nutritional content, increasing the volume of the agricultural produce and implanting disease resistance etc.. In the pharmaceutical sector, it can be used to reduce the time and cost involved in drug discovery process particularly for third world diseases, to custom design drugs and to develop personalized medicine (Fig. 1). Gene finding Protein structure prediction Drug design Figure 1 : Some major areas of research in bioinformatics and computational biology.
  • 2. Figure 2 : The tree of life depicting evolutionary relationships among organisms from the major biological kingdoms. A possible evolutionary path from a common ancestral cell to the diverse species present in the modern world can be deduced from DNA sequence analysis. The branches of the evolutionary tree show paths of descent. The length of paths does not indicate the passage of time and the vertical axis shows only major categories of organisms, not evolutionary age. Dotted lines indicate the supposed incorporation of some cell types into others, transferring all of their genes and giving the tree some web- like features. (Source : A Alberts, D Bray, J Lewis, M Raff, K Roberts & J D Watson, Molecular Biology of the Cell, p38, Garland, New York (1994)).
  • 3. It is assumed that life originated from a common ancestor and all the higher organisms evolved form a common unicellular prokaryotic organism. Subsequent division of different forms of life from this makes the diversity in the morphological and genetic characters (Fig. 2). Figure 3 : (A) An animal cell. The figure represents a rat liver cell, a typical higher animal cell in which features of animal cells are evident such as nucleus, nucleolus, mitochondria, Golgi bodies, lysosomes and endoplasmic reticulum (ER). (Source: www.probes.com/handbook/ figures/0908.html) (B). A plant cell (cell in the leaf of a higher plant). Plant cells in addition to plasma membrane have another layer called cell wall, which is made up of cellulose and other polymers where as animal cells have plasma membrane only. The cell wall, membrane, nucleus chloroplasts, mitochondria, vacuole, ER and other organelles that make up a plant cell are featured in the figure (Source: http://www.sparknotes.com/biology/cellstructure/celldifferences/section1.html). (A). Animal cell (B). Plant cell The common basis to all these diverse organisms is the basic unit known as the cell (Fig. 3). All cells whether they belong to a simple unicellular organism or a complex multicellular organism (human adults comprise ~ 30 trillion cells), possess a nucleus which carries the genetic material consisting of polymeric chains of DNA (deoxyribo nucleic acid), holding the hereditary information and controlling the functioning. Several challenges lie ahead in deciphering how DNA, the genetic material in these cells eventually leads to the formation of organisms (Fig. 4).
  • 4. CELL TISSUE ORGAN ORGANISM Figure 4 : Levels of organization. The entire DNA content in a cell is called the genome. In spite of the complex organization, cells of all organisms possess different The entire protein content in a cell is called the proteome. Cellome is the entire complement of molecules, including genome and proteome within a cell. Tissues are made of collections of cells. Tissue collections make organs. An organism is a collection of several organ systems. molecules of life for the maintenance of living state. Theses molecules include nucleic acids, proteins, carbohydrates and lipids (Fig. 5).
  • 5. Nucleic Acids Carbohydrates Proteins Lipids Figure 5 : Biom All organisms self re genetic material DNA, the olecules of life plicate due to the presence of polynucleotide consisting of four bases Adenine (A), Thymine (T), Guanine (G) and Cytosine (C) (Fig. 6). The entire DNA content of the cell is what is known as the genome. The segment of genome that is transcribed into RNA is called gene. So we can say that hereditary information is transferred in the form of genes contained on the four bases. Understanding these genes is one of the modern day challenges. Why only five percent of the entire DNA is in the form of genes and what is the rest of the DNA responsible for, under what conditions genes are expressed, where, when and how to regulate gene expression are some unsolved puzzles.
  • 6. Figure 6 : DNA and its alphabets A, G, C & T the nucleic acid bases. (Source: Molecular rmation present in DNA is expressed via RNA molecules into proteins biology of the cell, Garland Publishing. Inc. New York, 1994 ) The info which are responsible for carrying out various activities. This information flow is called the central dogma of molecular biology (Fig. 7). Potential drugs can bind to DNA, RNA or proteins to suppress or enhance the action at any stage in the pathway.
  • 7. Figure 7 : Central dogma of modern biology enome Analysis f genome called genes determine the sequence of amino acids in otein n order to identify all average of just 2% i.e. just about 160 enzymes. G Segments o pr s. The mechanism is simple for the prokaryotic cell where all the genes are converted into the corresponding mRNA (messenger ribonucleic acid) and then into proteins. The process is more complex for eukaryotic cells where rather than full DNA sequence, some parts of genes called exons are expressed in the form of mRNA interrupted at places by random DNA sequences called introns. Of the several questions posed here, one is that how some parts of the genome are expressed as proteins and yet other parts (introns as well as intergenic regions) are not expressed. Scores of genome projects are being carried out world wide i the genes and ascertain their functions in a specified organism. Human genome project is one such global effort to identify all the alphabets on the human genome, initiated in 1990 by the US government. A comparison of the genome sizes of different organisms (Table 1) raises questions like what types of genetic modifications are responsible for the four times large genome size of wheat plant and seven times small size of the rice plant as compared to that of humans. Mice and humans contain roughly the same number of genes – about 28K protein coding regions. The chimp and human genomes vary by an RNA Genome Gene = DNA DNA binding drugs Protein Gene therapy Primary Sequence RPRTAFSSEQLARLK REFNENRYLTERRR QQLSSELGLNEAQIK IWFQNKRAKIKKS RNA binding drugs Drugs Inhibitors/activators
  • 8. Table 1 : Genome sizes of some organisms Organism Genome size (Mb) (Mb=Mega base) Eschericia coli 4.64 M t suberculosi 4.4 H.Influenza 1.83 Homo sapiens (humans) 3300 Mouse 3000 Rice 430 Wheat 13500 (source : http://users m/jkimball.ma.ultranet/BiologyPages nomeSizes.html) Table 2 : Specific genetic disorders .rcn.co /G/Ge Genetic Disorder Reason Huntington’s Disease Excessive repeats of a three-base sequence, "CAG" on chromosome 4. Parkinson’s Disease Variations in genes on chromosomes 4,6. Sickle Cell Disease Mutation in hemoglobin-b gene on chromosome 11 Tay-Sachs Disease Co 15ntrolled by a pair of genes on chromosome Cystic Fibrosis Mutations in a single (CFTR) gene Breast Cancer Mutation on genes found on chromosomes 13 & 17 Leukemia Exchange arms ofof genetic material between the long chromosome 6 & 22. Colon cancer Proteins MSH2, MSH6 on chromosome 2 & MLH1 on chromosome 3 are mutated. Asthma Disfunctioning of genes on chromosome 5,6,11,14&12. Rett Syndrome Disfunctioning of a gene on the X chromosome. Bruk omaitt lymph Translocations on chromosome 8 Alzheimer disease Mutat 9 &ions on four genes located on chromosome 1, 14, 1 21. Werner Syndrome Mutat me 8.ions on genes located on chromoso Angelman Syndrome Deletion of a segment on maternally derived chromosome 15. Best disease Mutat e 11.ion in one copy of a gene located on chromosom Pendred Syndrome Defective gene on chromosome 7. Dias siatrophic dyspla Mutation in a gene on chromosome 5 (Source:ht h.gov/)tp://www.ncbi.nlm.ni
  • 9. Several genetic disorders like Huntington’s disease, Parkinson’s disease, sickle cell anemia etc. are caused due to mutations in the genes or a set of genes inherited from one generation to another (Table 2). There is a need to understand the cause for such disorders. Why nature carries such disorders and how to prevent these are some of the areas where extensive research endeavor has to be invested. An understanding of the genome organization can lead to concomitant progresses in drug-target identification. Comparative genomics has become a very important emerging branch with tremendous scope, for the above mentioned reasons. If the genome for humans and a pathogen, a virus causing harm is identified, comparative genomics can predict possible drug-targets for the invader without causing side effects to humans (Fig. 8). Potential Areas for Drug Targets = Hc ∩ I H = Human Genome / Proteome (Healthy Individual) I = Genome / Proteome of the Invader / Pathogen Figure 8 : Comparative genomics for drug target identification – an application of Bioinformatics Over the past two decades genetic modification has enabled plant breeders to develop new varieties of crops like cereals, soya, maize at a faster rate. Some of these called as transgenic varieties have been engineered to possess special characteristics that make them better. Recently efforts are on in the area of utilizing GM (genetically modified) crops to produce therapeutic plants. Another area where bioinformatics techniques can play an important role is in SNP (Single nucleotide polymorphisms) discovery and analysis. SNPs are common DNA sequence variations that occur when a single nucleotide in the genome sequence is changed. SNP’s occur every 100 to 300 bases along the human genome. The SNP
  • 10. variants promise to significantly advance our ability to understand and treat human diseases. Given the whole genome of an organism, finding the genes (gene annotation) is a challenging task. Various approaches have been utilized by different groups for this. These approaches are typically database driven which rely on the already known forma r which attempts to capture forces responsible for DNA structure and protein-DNA interactions walks along the length of the genome distinguishing genes from non-genes. As of now, the physico-chemical model (ChemGene 1.0) is able to distinguish successfully genes from non-genes in 120 prokaryotic genomes with fairly good specificities and sensitivities. Protein Folding Proteins are polymers of amino acids with unlimited potential variety of structures and metabolic activiti e dimensional shape called its confo its shape and genetic code. Substitution of a single amino acid can cause a major tist made some significant contributions energy which is about 10 to 15 kcal/mol relative to the unfolded state. How does a given sequence of amino acids fold into a specific conformation as soon as it is conceived on in tion. The Markov model based methods utilize short range correlations between bases along the genome. Other methods based on Fourier transform techniques emphasize global correlations. At IIT Delhi, we are attempting to develop a hypothesis driven physico-chemical model for genome analysis and for distinguishing genes from non-genes. In this model, a vecto es. Each protein possesses a characteristic thre rmation. Sequence of amino acids of protein determines function. This in turn is genetically controlled by the sequences of bases in DNA of the cell through the alteration in function. Study of homologous proteins from different species is of interest in constructing taxonomic relationships. Proteins may be classified into structural proteins, enzymes, hormones, transport proteins, protective proteins, contractile proteins, storage proteins, toxins etc.. Prof G N Ramachandran, an Indian scien towards an understanding of the secondary structure of proteins. His fundamental work in this area is remembered in the form of Ramachandran maps. Protein folding can be considered to involve changes in the polypeptide chain conformation to attain a stable conformation corresponding to the global minimum in free
  • 11. the ribosomal machinery using the information on mRNA in millisecond timescales, is the problem pending a resolution for over five decades. For a 200 amino acid protein with just two conformations per amino acid, a systematic search for this minimum among all possible 2200 conformations, will take approximately 3x1054 years which is much longer than the present age of the universe. Despite this innumerable number of conformations to search, nature does it in milliseconds to seconds (Fig. 9). Figure 9 : The protein folding problem (Source: press2.nci.nih.gov/.../ snps_cancer21.htm) Solution to the protein folding problem has an immense immediate impact on society. In biotech industry, this can be helpful in the design of nanobiomachines and sed drug discovery wherein the stru biocatalysts to carry out the required function. Pulp, paper and textile industry, food and beverages industry, leather and detergent industry are among the several potential beneficiaries. Other important implications are in structure ba cture of the drug target is vital (Fig. 10). Structures of receptors – a major of class of drug targets - are refractory to experimental techniques thus leaving the fold open to computer modeling.
  • 12. Figure 10 : Drug targets (Source: Drews J., Drug Discovery: A historical perspective, Science, 2000, 287: 1960-1964) There is an urgent demand for faster and better algorithms for protein structure prediction. There are three major ways in which protein structure prediction attempts are currently progressing viz. homology modeling, fold recognition and ab initio approaches. Whereas the first two approaches are database dependent relying on known structures and folds of proteins, the third one is independent of the databases and starts from the physical principles. Although the ab initio techniques till date are able to predict only small proteins, because of their first principle approach, they have the potential to predict new / novel folds and structures. On account of the complexity of the problem, computational requirement for such a prediction is a major issue which can easily be appreciated by the estimates of time required to fold a 200 amino acid protein which evolves ~10-11 sec per day per processor according to Newton’s laws of motion. This will require approximately a million years to fold a single protein. If one can envision a million proc tein can be folded in one ear computer e. In this Blue Gene essors working together, a single pro y tim spirit, the IBM has launched a five year project in the year 1999 to address the complex biomolecular phenomena such as protein folding. At IIT Delhi, we are developing a computational protocol for modeling and predicting protein structures of small globular proteins. Here a combination of bioinformatics tools, physicochemical properties of proteins and ab initio approaches are used. Starting with the sequence of amino acids, for ten small alpha helical proteins, structures to within 3-5Å of the native are predicted within a day on a 50 ultra sparc III Cu 900 MHz processor cluster. Attempts are on to further bring the structures to within
  • 13. <3Å of the native structures via molecular dynamics and Metropolis Monte Carlo simulations. Drug Design As structures of more and more protein targets become available through crystallography, NMR and bioinformatics methods, there is an increasing demand for computational tools that can identify and analyze active sites and suggest potential drug molecules that can bind to these sites specifically (Fig. 11). Figure 11 : Active-site directed drug-design Also to combat life-threatening diseases such as AIDS, Tuberculosis, Malaria etc., a global push is essential (Table 3). Millions for Viagra and pennies for the diseases of the poor is the current situation of investment in Pharma R&D. Active Site
  • 14. Table 3 : WHO Calls for Global Push Against AIDS & Tuberculosis & Malaria (Source : www.globalhealth.org, www.thenation.com ) Time and cost required for designing a new drug are immense and at an unacceptable level. According to some estimates it costs about $880 million and 14 years of research to develop a new drug before it is introduced in the m Lead Generation & Lead Optimization 3.0 yrs 15% Preclinical Development 1.0 yrs 10% Phase I, II & III Clinical Trials 6.0 yrs 68% FDA Review & Approval 1.5 yrs 3% Drug to the Market 14 yrs $880million arket (Table 4). Table 4 : Cost and time involved in drug discovery Target Discovery 2.5 yrs 4% (Source : PAREXEL, PAREXEL’s Pharmaceutical R&D Stastical Sourcebook, 2001, p96) illion die each year due to diseases like diarrhoea, small pox, polio, river blindness, leprosy. cost $ 12 billion to fight the disease of poverty (AIDS medication about $15K per annum) Millions for Viagra, Pennies for the Diseases of the Poor Of all new medications brought to the market (1223) by Multinationals from 1975 only 1% (13) are for tropical diseases plaguing the third world. Life style drugs dominate Pharma R&D (1) Toe nail Fungus (2) Obesity (3) Baldness (4) Face Wrinkle (5) Erectile Dysfunction (6) Separation anxiety of dogs etc. Nearly 6 m Estimated A new economic analysis Healthy people can get themselves out of poverty. Relieving a population of burden of the diseases for 15 to 20 years will give a huge boost to economic development. Nearly 6 million die each year due to diseases like diarrhoea, small pox, polio, river blindness, leprosy. cost $ 12 billion to fight the disease of poverty (AIDS medication about $15K per annum) Millions for Viagra, Pennies for the Diseases of the Poor Of all new medications brought to the market (1223) by Multinationals from 1975 only 1% (13) are for tropical diseases plaguing the third world. Life style drugs dominate Pharma R&D (1) Toe nail Fungus (2) Obesity (3) Baldness (4) Face Wrinkle (5) Erectile Dysfunction (6) Separation anxiety of dogs etc. Estimated A new economic analysis Healthy people can get themselves out of poverty. Relieving a population of burden of the diseases for 15 to 20 years will give a huge boost to economic development.
  • 15. Intervention of computers at some plausible steps is imperative to bring down the cost an Fig re 12 E for a given drug d time required in the drug discovery process (Fig. 12, Table 5). u : Potential areas for In silico intervention in drug-discovery process Table 5 : High End Computing Needs for In Silico Drug Design stimates of current computational requirements to complete a binding affinity calculation (Source : http://cbcg.lbl.gov) In silico methods can help in identifying drug targets via bioinformatics tools. They can also b g / active sites, generate candidate m Modeling complexity Method Size of Required computing library time Molecular Mechanics SPECITOPE 140,000 ~1 hour Rigid ligand/target LUDI 30,000 1-4 hours CLIX 30,000 33 hours Molecular Mechanics Hammerhead 80,000 3-4 days Partially flexible ligand DOCK 17,000 3-4 days Rigid target DOCK 53,000 14 days e used to analyze the target structures for possible bindin olecules, check for their drug-likeness, dock these molecules with Molecular Mechanics Fully flexible ligand Rigid target ICM 100,000 ~1 year (extrapolated) Molecular Mechanics Free energy perturbation AMBER CHARMM 1 ~several days QM Active site and MM protein Gaussian, Q-Chem 1 >several weeks
  • 16. the target, rank them according to their binding affinities, further optimize the molecules to improve binding characteristics etc.. -50 0 -30 -20 -10 0 10 20 30 BindingFreeEnergy(kcal/mol) Pursuing the dream that once the gene target is identified and validated drug discovery protocols could be automated using Bioinformatics & Computational Biology tools, at IIT Delhi we have developed a computational protocol for active site directed drug design. The suite of programs (christened “Sanjeevini”) has the potential to evaluate and /or generate lead-like molecules for any biological target. The various modules of this suite are designed to ensure reliability and generality. The software is currently being optimized on Linux and Sun clusters for faster and better results. Making a drug is more like designing an adaptable key for a dynamic lock. The Sanjeevini methodology consists of design of a library of templates, generation of candidate inhibitors, screening candidates via drug-like filters, parameter derivation via quantum mechanical calculations for energy evaluations, Monte Carlo docking and binding affinity estimates e protocols tested on Cyclooxygenase-2 as a target could successfully distinguish NSAIDs (Nonsteroid on other (red). based on post facto analyses of all atom molecular dynamics trajectories. Th al Anti-inflammatory drugs) from non-drugs (Fig 13). Validation targets is in progress. -4 Figure 13 : The active site directed lead design protocols developed from first principles and implemented on a supercomputer could segregate NSAIDS (blue) from Non-drugs
  • 17. Bioinformatics and Biodiversity Biodiversity informatics harnesses the power of computational and information technologies to organize and analyze data on plants and animals at the macro and at genome levels. India ranks among the top twelve nations of the world in terms of biological diversity (Fig. 14 and Fig15). Himalayas - This majestic range of mountains is the home of a diverse range of flora and fauna. Eastern Himalayas is one iversity hotspots in India. Chilika - This wetland area is protected under the Ramsar convention. of the two biod Sunderbans - The largest mangrove forest in Western Ghats - One of the two India. biodiversity hotspots in India. Thar desert ntrast to the Himalayan region. (Source: http://edugreen.teri.res.in/explore/maps/biodivin.htm) - The climate and vegetation in this area is a co Figure 14 : Biodiversity in India
  • 18. Figure 15: Biodiversity bioinformatics is essential to preserve the natural balance of flora and fauna on the planet and to prevent extinction of species (Source: www.hku.hk/ ecology/envsci.htm). Bioinformatics endeavors in India Owing to the well acknowledged IT skills and a spate of upcoming software, biotech and pharma industries and active support from Government organizations, the field of Bioinformatics appears promising. The projections of the growth potential in India in a global scenario however belie these expectations (Fig. 16). Figure 16 : Growth potential for Bioinformatics based business opportunities in India ccording to IDC (International Data Corporation, India.)a 0 0.2 0.8 1 1.2 1.4 1.6 1.8 1 0.4 0.6 2 Year2001 2005 Global India
  • 19. Creating the necessary infrastructure around specialists to meet the high omputational power required for these activities (Table 6, Fig. 17) is one step in the ght direction. able 6 : Geno ate of computational quirements . Gene Prediction c ri T me to drug discovery research: A rough estim re 1 Homology/string comparison. 300 Giga flop ~ 3*109 bp 2. Protein Structure Prediction - Threading 100 Giga flop - Statistical Models - Filters to reduce plausible structures Molecular Dynamics s per simulation 25000 Active site directed drug design 100 structures 30 Peta flop 1-ns simulation for structure refinement Total Compute Time 5000ns Number of atom 3. Scan 1000 drug molecules/protein 18 Peta flop Total Computational requirement to design one lead compound from genome 16 o design ten lead compounds per day (on a dedicated machine) e requirement is 5.8 tera flops capacity. ut of every 100 lead compounds, only one may become a drug, which further increases e computer requirements) 3ns simulation per drug molecule (Active site searches, docking, rate and affinity determinations etc.) Total Compute Time 3000ns 25000 atoms per simulation Summary ~ 50 Peta flop (5.x10 floating point operations) T th (O th
  • 20. for bioinformatics and scientific community from across the country, l Private iotechnology, Govt of India). It is envisioned that the span all the 60+ Bioinformatics Centres of the Department of Biotechnology e 18). Figure 17 : Challenges To enable access by the student the Supercomputing Facility at IIT Delhi is connected to Biogrid-India (a Virtua Network of the Department of B Biogrid will with GBPS bandwidth and at least 10 teraflops of compute capacity soon (Figur Transcription Model Education RNA splicing model Signal transduction pathways Protein:DNA A/RN protein:protein Protein structure prediction Drug Design Protein Evolution Gene ontologies Speciation Challenges for Bioinformatics
  • 21. Figure 18 : Biogrid-India Pro-active policies and conducive environment to researchers, entrepreneurs and Industry will go a long way in steering India towards leadership in the area of Bioinformatics and Computational Biology. Acknowledgements. BJ is thankful to the Department of Biotechnology, the Department of Science & Technology, Council of Scientific & Industrial Research, Indo-French Centre for the Promotion of Advanced Research and IIT Delhi administration for support over the years.