SlideShare una empresa de Scribd logo
1 de 47
USE OF BIO-INFORMATIC TOOLS TO
STUDY IMPLICATIONS OF G-C
CONTENT OF DNA ON THE PROTEIN.




DEBTANU CHAKRABORTY
Index
  1) Note of Acknowledgement
  2) Bio-informatics
  3) G-C content
  4) Classification tree of Bacteria
  5) List of low G-C bacteria
  6) List of high G-C bacteria
  7) Introduction to Carbonic Anhydrase
  8) Peptide Sequence and their analysis
  9) Gene Sequences and their analysis
  10) Codon usage plot
  11) Conclusion
  12) Future work-scope
Note of Acknowledgement
The project would have been incomplete without the help of a number of persons. First I
would like to thank my mentor and guide Prof. Chanchal K. Das Gupta who gave me the
idea and inspiration to do the project and helped me in every step whenever I was in
trouble. I would like to thank Prof. Punyasloke Bhadury who helped me by introducing to
NCBI website and showing me to perform tasks like alignment, BLAST in internet.

I cannot repay the sin if I don’t mention the names of my superiors Papri di, Amit da and
Shimonti di who also helped me with the project.

I have in my work, extensively used the websites- NCBI and Uniprot.
Bioinformatics is the application of statistics and computer science to the field
of molecular biology.
The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of
informatic processes in biotic systems. Its primary use since at least the late 1980s has
been in genomics and genetics, particularly in those areas of genomics involving large-
scale DNA sequencing.

Bioinformatics now entails the creation and advancement of databases, algorithms,
computational and statistical techniques and theory to solve formal and practical
problems arising from the management and analysis of biological data.

Over the past few decades rapid developments in genomic and other molecular
research technologies and developments in information technologies have combined to
produce a tremendous amount of information related to molecular biology. It is the
name given to these mathematical and computing approaches used to glean
understanding of biological processes.

Common activities in bioinformatics include mapping and analyzing DNA and protein
sequences, aligning different DNA and protein sequences to compare them and
creating and viewing 3-D models of protein structures.

The primary goal of bioinformatics is to increase the understanding of biological
processes. What sets it apart from other approaches, however, is its focus on
developing and applying computationally intensive techniques (e.g., pattern
recognition, data mining, machine learning algorithms, and visualization) to achieve this
goal. Major research efforts in the field include sequence alignment, gene
finding, genome assembly, drug design, drug discovery, protein structure
alignment, protein structure prediction, prediction of gene expression and protein-
protein interactions, genome-wide association studies and the modeling of evolution.
GC-content (or guanine-cytosine content), in molecular biology, is the percentage
of nitrogenous bases on a molecule which are either guanine or cytosine (from a
possibility of four different ones, also including adenine and thymine). This may refer to
a specific fragment of DNA or RNA, or that of the whole genome. When it refers to a
fragment of the genetic material, it may denote the GC-content of part of a gene
(domain), single gene, group of genes (or gene clusters) or even a non-coding region. G
(guanine) and C (cytosine) undergo a specific hydrogen bonding whereas A (adenine)
bonds specifically with T (thymine).

The GC pair is bound by three hydrogen bonds, while AT pairs are bound by two
hydrogen bonds. DNA with high GC-content is more stable than DNA with low GC-
content, but contrary to popular belief, the hydrogen bonds do not stabilize the DNA
significantly and stabilization is mainly due to stacking interactions. In spite of the
higher conferred to the genetic material, it is envisaged that cells with DNA with high
GC-content undergo autolysis, thereby reducing the longevity of the cell per se. Due to
the robustness endowed to the genetic materials in high GC organisms it was
commonly believed that the GC content played a vital part in adaptation temperatures, a
hypothesis which has recently been refuted.

In PCR experiments, the GC-content of primers are used to predict their annealing
temperature to the template DNA. A higher GC-content level indicates a higher melting
temperature.
THE EVOLUTION TREE IN BACTERIA WHERE IS G-C CONTENT STUDY IS AN
ANALYTICAL TOOL.



The guanine plus cytosine (GC) content in bacteria ranges from ~20% to 75% where as
we will see in a later lecture that eukaryotic genomes have GC contents that often have
a restricted range from ~35-50% (about 40%-45% in vertebrates).
Some Bacteria with low G-C content -
Some Bacteria with high G-C content-
For our convenience, we chose Carbonic Anhydrase because it is present in all bacteria
across the G-C content spectrum of Bacterias-

The carbonic anhydrases (or carbonate dehydratases) form a family of enzymes that
catalyze the rapid conversion of carbon dioxide and water to bicarbonate and protons, a
reaction that occurs rather slowly in the absence of a catalyst.[1] The active site of most
carbonic anhydrases contains a zinc ion; they are therefore classified
as metalloenzymes.

THE CARBONIC ANHYDRASE PROTEIN-
In our analysis, we choose the following bacteria-

   1)   Methaococcus voltae A3 (UI-A8TF20) (G-Cc=27%)
   2)   Staphylococcus carnosus (UI-B9DMU8_STACT) (G-Cc=34%)
   3)   Vibrio cholera (UI-Q9KMP6_VIBCH) (G-Cc=47%)
   4)   Escherichia coli (UI-P61517) (G-Cc=50%%)
   5)   Truepera radiovictrix DSM1703 (UI-ADI14363) (G-Cc=68.2%)
   6)   Salinispora arenicola (UI-A8MOD8) (G-Cc=69.2%%)
   7)   Frankia CcI (UI-Q2JF50) (G-Cc=71%)

        *UI stands for the Uniprot Accession number of the Carbonic Anhydrase protein
        of the respective bacteria.

We begin analyzing the protein Carbonic Anhydrase from these bacteria-

The peptide sequence goes as follows-

>Methanococcus voltae Carbonic Anhydrase Protein
LN*LFNLASVNVNHKPFNFHIFRNCRVIFD*FDTFQHVFFFVIHFTHPSFKVWRKVWIYS
SFNHFFSYLFNICSCHSTVGMTYDSYLFNI*TVYCNY*RP*YIVCNNITCVFDDFCVASF
*THFFR*EIYESCIHTSYYC*FLFRFGFCSDSFTYTQ



>Staphylococcus carnosus Carbonic Anhydrase Protein
YPXXXMTLLESILAYNKDFVGNKEFENYTTSKKPDKKAVLFTCMDTRLQDLGTKALGFNN
GDLKVVKNAGAIITHPYGSTIKSLLVGIYALGAEEIIIMAHKDCGMGCLDVSTVKDAMKE
RGVTEETFKIIEHSGVDVDSFLQGFKDAEENVRRNIDMVYNHPLFDKSVPIHGLVIDPHT
GELDLIQDGYELAAQNK*



>Vibrio cholerae Carbonic Anhydrase Protein
MKKTTWVLAMVASMSFGVQASEWGYEGEHAPEHWGKVAPLCAEGKNQSPIDVAQSVEADL
QPFTLNYQGQVVGLLNNGHTLQAIVRGNNPLQIDGKTFQLKQFHFHTPSENLLKGKQFPL
EAHFVHADEQGNLAVVAVMYQVGSENPLLKVLTADMPTKGNSTQLTQGIPLADWIPESKH
YYRFNGSLTTPPCSEGVRWIVLKEPAHLSNQQEQQLSAVMGHNNRPVQPHNARLVLQAD*




>1st Escherichia coli Carbonic Anhydrase Protein
LFVVGVFQLEVGDPVTVTLLKGFAVSRCDIQITQQAVVNAVGPAVNGDFLPAFPR*LHNS
GVAQVIHLFHDVQFTQGIQTALLRHFAEQ*AMFEPDIADMQQPVVDKPQFRVFNCGLYAA
ATVV
>2nd Escherichia coli Carbonic Anhydrase Final rip
MKDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGEL
FVHRNVANLVIHINNWLLHIRDIWFKHSSLLGEMPQERRLDTLCELNVMEQVYNLGHSTI
MQSAWKRGQKVTIHGWAYGIHDGLLRDLDVTATNRETLEQRYRHGISNLKLKHANHK*




>3rd Escherichia coli Carbonic Anhydrase Final rip2
VKEIIDGFLKFQREAFPKREALFKQLATQQSPRTLFISCSDSRLVPELVTQREPGDLFVI
RNAGNIVPSYGPEPGGVSASVEYAVAALRVSDIVICGHSNCGAMTAIASCQCMDHMPAVS
HWLRYADSARVVNEARPHSDLPSKAAAMVRENVIAQLANLQTHPSVRLALEEGRIALHGW
VYDIESGSIAAFDGATRQFVPLAANPRVCAIPLRQPTAA*


.

>Truepera radiovictrix DSM1703 Carbonic Anhydrase Protein
S*PFQKRAVSGRAG*KGCRQQLEPARLEVVHGADDGERALGDARL*GRVRGDEANGRLDV
LPHGPLERTPRPPLSRVAATSGAPQSGLERPHEGRQRGVGAPLLEGCGGGRDRAAAGVPQ
HHDERHAEHRDAVGEARQNRVVDDVAGDPVGKEVAQALVEDDLRRHARVGAAEHRREGVL
LARQGRAPARVLVRVRHAPLEVALVPGQQALERPLGGQGRLGGGH



>1 Salinispora arenicola Carbonic Anhydrase 1st Protein
MNCPGTPDTQPGSHPVSSSGIGGSRSGPVGPEQALAELYDGNRRFAVGVPIRPHQDIDRR
VALADGQQPFAVIVGCSDSRLAAEIIFDRGLGDLFVVRTAGHTVGPEVLGSVEYAVTVLG
APLVVVLGHDSCGAVQAARTADATGAPASGHLRAVVDGVVPSVRRAGARGVTEIDQIVDI
HIEQTVEAVLGRSEAVAAAVAGGRCAVVGMSYRLTAGEVHTVTAVGLAAPTTPPAAPETR
PSAGPA*


>2 Salinispora arenicola Carbonic Anhydrase 2nd Protein
XXTXXESGRVAESESTAFRWAGGRCGRACGVFVDEGALVGDQRITDSVAHHAHRRIREAD
GGQPAVGAWRPSTTQPGSSSASSRGPRTEALWGGPRMH*LAGAA*TPLRHRPG*SFRDTY
GR*GDRPSGHWFCRVWTSDQWHSAHRGPWASALRRRQGGVHLPS*GQAAARQPTGDRYGP
PAGV*TGSLSGERRPDRRHGPSPGRADRKRPALQPGTSPTRGEAGPCRGQCLLFPRYRRG
GSPQWQTLL


>1 Frankia CcI Carbonic Anhydrase 1st Protein
CPSPTTT*PTTPPTRRPSPGRFRCRRPSTSPPSPAWTHGSTSTRSLAWATARLTSSATPA
ASSPTTRSVPSRSASACSAPARSS*STTPTAAC*PSPTTILNARSRTRPGSNQNGPWSRL
PTWPKTYASRLRGSRRARSSRIPTPSAASSSMLPPDCSPKSR
>2 Frankia CcI3 Carbonic Anhydrase 2nd Protein
VDTDDHTAVDPVADVHADDVHADTVRPADTVSPVSGAATATELLLSYAAGHPARRREAGL
PALPGARPRLGVAVVACMDVRIQVEALLGLVEGDAHILRNAGGVITPDVVRSLAVSQHVL
GTTEIILLHHTGCGLERITDDGFRDQLECKTGVRPEWAVYSFPDVEEDVRKSVRVLRSSP
FLQSTTSVRGFVYQVETGALVEVLP*


We have 3 protein sequences for E coli and 2 sequences each for Salinispora and
Frankia. We now compare them amongst themselves.


For E coli-




The sequence marked Escherishsia is the 1st sequence.

The sequence Ecoli is the 2nd sequence.

The sequence Final is the 3rd sequence.
For Salinispora-




       .      :

For Frankia-




After viewing the alignment of the suspected Carbonic Anhydrase within the same
species, we wish to align the proteins from all the sources, all proteins from same
species is also incorporated.
The alignment sequence of the bacteria is as follows-
Analysis- we can see two things from the above.

   1) Bacteria with high G-C have two genes for Carbonic Anhydrase and
      consequently 2 proteins suspected to be Carbonic Anhydrase.
   2) Bacteria with high G-C incorporate synonymous amino acid which requires G-C
      rich codons to compensate in their protons.



We will elaborate on the 2nd point later using Codon-plot. We can show that the
corresponding codon of the DNA of Carbonic Anhydrase gene of this bacteria.

Now we move to analyzing the DNA of the genes of Carbonic Anhydrase-

The DNA sequences are as follows-

>Methanococcus voltae Carbonic Anhydrase of 471 bases
ttaaattaactttttaatctcgccagtgttaatgtcaatcataagcccttcaacttccac
atctttaggaattgcagggtgatttttgattaattcgacacctttcaacacgttttcttc
ttcgttatccattttacccatccaagcttcaaagtctggcgtaaagtatggatttactcc
tcttttaatcatttcttttcttatctcttcaatatctgctcctgccattccacagtcggt
atgacctacgatagctatcttttcaacatctaaacagtatattgcaactactaacgacct
taatacatcgtctgtaataatattacctgcgtttttgatgacttttgcgtcgcctctttc
taaacccattttttcaggtaagaaatttacgagtcttgtatccatacaagttattactgc
taatttctttttcggtttggcttctgctccgatagtttcacctatactcaa

>Staphylococcus carnosus Carbonic Anhydrase of 594 bases
taccccancancanaatgacgttattagaaagcattttagcttataataaagattttgtc
ggcaacaaagaatttgaaaactatacaacaagtaaaaaaccagataaaaaagcagtgtta
tttacatgtatggatacacgtttgcaagatttaggtacaaaagcactcggttttaataat
ggtgacttgaaagttgttaaaaatgcaggtgcaattatcacgcacccatatggttcaact
ataaaaagcttactagtaggtatttatgcattaggtgctgaagaaattattattatggca
cataaagattgcggaatgggttgtcttgatgtcagcactgttaaagacgcaatgaaagaa
cgtggcgtaacagaagaaacatttaaaatcatcgaacattctggtgtagatgtagacagc
tttttacaaggtttcaaagatgctgaagaaaatgtccgcagaaatatcgatatggtatat
aatcatcccttatttgataaatccgtacctattcacggcttagtcatcgatcctcatacg
ggggaattagatttaattcaagacggctatgaattagctgctcaaaataaataa
>Vibrio cholerae Carbonic Anhydrase of 720 bases
atgaaaaagacaacgtgggtattagcgatggtagccagtatgagcttcggcgtacaggct
tccgagtgggggtatgaaggagagcatgctccggagcattggggcaaagttgcccctctt
tgcgcagagggtaaaaatcaaagcccgattgatgtcgcgcaaagcgtagaagcggatcta
cagcctttcacgctcaattatcaagggcaagtggttgggctgctcaataacgggcacact
ttacaagcgatagtccgtggtaataacccactgcagatcgatggcaaaacgtttcagctt
aagcagtttcattttcataccccttctgaaaatttgctaaaaggaaaacaattcccactg
gaagcgcattttgttcatgccgacgagcaaggcaatctggcggttgttgcggtgatgtac
caagtggggtcggaaaatccgctgcttaaggttctcacggcggatatgccgaccaaaggg
aattcgactcagctcacgcaagggatccctttggctgattggatcccagaatcgaagcac
tactatcgtttcaatggttcattgactacgccgccttgcagtgaaggtgtacgttggatt
gtgttaaaagagccagcacatttgtcgaatcaacaagagcagcagcttagtgccgtgatg
ggacacaataatcgacccgtacaaccgcataatgctcgtcttgtcttgcaagccgactaa


>Escherichia coli Carbonic Anhydrase of 372 bases
ttatttgtggttggcgtgtttcagcttgaggttggagatcccgtgacggtaacgttgctc
aagggtttcgcggttagtcgctgtgacatccagatcacgcagcaagccgtcgtgaatgcc
gtaggcccagccgtgaatggtgactttctgcccgcgtttccacgctgattgcataatagt
ggagtggcccaggttatacacctgttccatgacgttcagttcacacaaggtatccagacg
gcgctcttgcggcatttcgccgagcaatgagctatgtttgaaccagatatcgcggatatg
cagcagccagttgttgataagccccagttccgggttttcaactgcggcttgtacgccgcc
gcaaccgtagtg


>123 Escherichia coli carbonic Anhydrase Final
aagccccagttccgggttttcaactgcggcttgtacgccgccgcaaccgtagtggccaca
gataataatgtgttcaacttcgagtacatccactgcatactgaaccacggaaaggcagtt
caggtcagtttatttgtggttggcgtgtttcagcttgaggttggaaatcccgtgacggta
acgttgctcaagggtttcgcggttggtggcggtaacatccagatcacgcagcaagccgtc
gtgaatgccgtaggcccagccgtgaatggtaactttctgcccgcgtttccacgctgattg
cataatggtggagtggcccaggttatacacctgttccatgacgttcagttcacacaaggt
atccagacggcgctcttgcggcatttcgccgagcaatgagctatgtttgaaccagatatc
gcggatatgcagcagccagttgttgatgtgaatgaccaggttagcaacattacggtgaac
aaagagttcgcccggctcaagaccggttaaacgttctgcaggaacgcgactgtcggaaca
tccaatccatagaaagcgcggtttttgcgcttgtgccagtttctcaaaaaacccgggatc
ctcttccaccagcatttttgaccatagtgcattgttgctgatgagtgtatctatgtcttt cat




>456 Escherichia coli Carbonic Anhydrase Final 2
gtgaaagagattattgatggattccttaaattccagcgcgaggcatttccgaagcgggaagcct
tgtttaaacagctggcgacacagcaaagcccgcgcacactttttatctcctgctccgacagccg
tctggtccctgagctggtgacgcaacgtgagcctggcgatctgttcgttattcgcaacgcgggc
aatatcgtcccttcctacgggccggaacccggtggcgtttctgcttcggtggagtatgccgtcg
ctgcgcttcgggtatctgacattgtgatttgtggtcattccaactgtggcgcgatgaccgccat
tgccagctgtcagtgcatggaccatatgcctgccgtctcccactggctgcgttatgccgattca
gcccgcgtcgttaatgaggcgcgcccgcattccgatttaccgtcaaaagctgcggcgatggtac
gtgaaaacgtcattgctcagttggctaatttgcaaactcatccatcggtgcgcctggcgctcga
agaggggcggatcgccctgcacggctgggtctacgacattgaaagcggcagcatcgcagctttt
gacggcgcaacccgccagtttgtgccactggccgctaatcctcgcgtttgtgccataccgctac
gccaaccgaccgcagcgtaa


>Truepera radiovictrix DSM1703 Carbonic Anhydrase consisting
of 675 bases
tcataaccgttccaaaagcgggccgtgagcgggcgcgctgggtaaaaggggtgtcggcag
cagctcgagcccgcccgtctcgaggtcgtacacggcgccgacgacggcgagcgtgcgctg
ggcgatgcgcgcctttaggggcgggtgcgcggcgatgaggcgaacggacgcctcgacgtt
ctccctcacggcccccttgagcgtacaccccgtccccccctcagccgtgtcgcagcgacg
agcggcgcgccgcaaagcgggctcgagcgcccgcacgaggggcgtcagcgaggggtcggc
gcccccctcctcgagggctgcggcggcggccgcgaccgcgccgcagccggtgtgccccag
caccacgatgagcggcacgccgagcaccgagacgccgtaggtgaggctcgccaaaatcgc
gtcgtcgacgatgttgccggcgacccggttggtaaagaggtcgcccaagccctggtcgaa
gatgatctgcggcggcacgcgcgagtcggcgcagccgagcaccgccgcgaaggggtgctg
ctcgcgcgtcaaggacgcgcgccagcgcgcgtcttggtgcgggtgcgccatgcgcccctc
gaggtagcgctggtgcccggccaacaggcgctcgagcgccccttggggggtcaggggcgt
ctggggggcggtcat



>Salinispora arenicola Carbonic Anhydrase 1 of 741 bases.
atgaactgcccaggaacgcccgacacacagccgggctcgcacccggtgtcctccagtgga
atcggcggttcccggagcgggccggtcgggcccgagcaggcgcttgccgagttgtacgac
ggcaaccggcgattcgccgttggtgttccgatccgcccacaccaggacatcgaccgtcgg
gtcgccctggcggatggtcagcagcccttcgcggtgatcgtcggctgttccgactcccga
cttgctgctgagatcatctttgaccgtggtctcggtgacctgttcgtggtacgcaccgct
gggcacacggtcgggccagaggtgctgggcagcgtcgagtacgcggtcaccgtgctgggt
gcgccgctggtggtggtgctcggccacgactcctgtggagcggtacaggcggcccggacc
gccgacgccaccggcgcaccggcgtccgggcacctccgcgctgtggtggacggggtggtg
ccgagcgtgcgtcgggccggggcccgtggggttaccgagatcgaccagatcgtcgacatt
catatcgagcagaccgttgaggcggtgcttggccgttctgaggcggtcgcagccgcggtg
gccggcggacggtgtgcggtggtgggaatgtcgtaccggctcaccgcaggtgaggtgcac
acggttaccgcggttggcctcgcggcgccgaccacaccaccggccgcgcctgagacccgc
cccagcgccggaccggcgtaa


>abc Salinispora arenicola Carbonic Anhydrase 2 of 748 bases
naancacancanatgaatcgggccgtgtggccgagtcggagagcactgctttccggtgg
gctggtgggcgctgtggccgtgcttgcggggtgttcgtcgacgaaggcgcgctcgtcggc
gaccagcgcatcaccgacagcgtcgcccaccacgcccaccgccgcattcgagaggctgat
ggagggcaaccagcggtgggtgcgtggagaccttcaacaacccaaccgggatccagctcg
gcgtcaagtcgtggcccacgaacagaagccctttggggcggtcctcgcatgcattgactc
gcgggtgccgcctgaactcctcttcgacaccggcctgggtgatcttttcgtgacacgtac
gggaggtgaggcgatcggcccagtggtcactggttctgtcgagtttggacctctgaccag
tggcactccgctcatcgtggtccttgggcatcagcgttgcggcgccgtcaaggcggcgta
cacctcccttcgtgagggcaagccgctgcccggcaacctaccggcgatcgttacggccct
ccagccggcgtatgaacaggtagcctcagcggggagcgccgacccgatcgacgccatggc
ccgagcccaggccgagctgatcgcaaacgacctgcgctccaacccggaactagccccact
cgtggcgaagcgggaccttgccgtggtcagtgcctactattccctcgataccggcgcggt
ggaagtcctcagtggcagaccctcctga



>Frankia CcI Carbonic Anhydrase 1 of 488 bases
tgtccgtcaccgacgactacctgaccaacaacgccgcctacgcgaagaccttcgccgggc
cgcttccgctgccgccgtccaagcacatcgccgccgtcgcctgcatggacgcacggctca
acgtctacgcgatccttggcctgggcgacggcgaggctcacgtcatccgcaacgccggcg
gcgtcgtcaccgacgacgagatccgttccctcgcgatcagccagcgcctgctcggcaccc
gcgagatcatcctgatccaccacaccgactgcggcatgctgaccttcaccgacgacgatt
ttaaacgctcgatccaggacgagaccgggatcaaaccagaatgggccgtggagtcgttta
ccgacctggccgaagacatacgccagtcgattgcgcggatcaaggcgagcccgttcatcc
cgcataccgacgccatccgcggcttcatcttcgatgttgccaccggactgctcaccgaag
tcgcgtga

>xyz Frankia CcI3 Carbonic Anhydrase 2 of 618 bases
gtggacaccgatgaccacaccgctgtcgaccccgttgccgatgtccatgcagacgatgtc
catgcggacaccgtgcgccccgcggatacggtgagcccggtgagcggcgctgccacggcg
accgaactcctgctgagctacgctgcaggtcaccccgcccggcggcgggaggccgggcta
cctgccctgcccggcgcgcggccgcgcctgggcgtcgcggtggttgcgtgtatggacgtg
cggatccaggtggaggccttgctcggtcttgtcgaaggtgacgcccacatcctgcgcaac
gccggtggtgtcatcaccccggatgtggtccgctcgctcgccgtgagccagcacgtgctg
ggaacgacggagatcattcttttgcatcacaccgggtgtggtctcgaaaggatcaccgac
gacgggttccgggaccagttggagtgcaagacgggcgttcgtcccgaatgggccgtgtat
tcctttcccgatgtcgaggaggacgtgcgcaagtccgtcagggtgctgcgttcgtcgccg
ttcctgcagtccaccacctcggtacgcgggttcgtctaccaggtggagaccggggcactg
gtcgaggttctgccgtag

We will now proceed to compare the translation product of the ORF of the gene with the
original protein product. Methanococcus produces the protein in reading frame 1 of the
reverse strand of the DNA segment. It does not start with ATG.first amino acid is L
inplace of M.Staphylococcus and Vibrio does the same thing in frame 1 of forward
direction. The same is observed in Frankia and Salinispora.
The gene product is typically labeled ‘orf’.


   1) 1) Methanococcus Voltae A3-




       2)Staphylococcus Carnosus-




       3)Vibrio cholera-
3)The comparison of E.coli gene-pro and protein are as follows-




For the rest, we will be comparing only 1 suspected protein and gene product
for consistency.For Truepera-
5) For Salinispora-




6) For Frankia-
Codon Analysis is as follows-

Results for 411 residue sequence "Methanococcus voltae Carbonic
Anhydrase of 471 bases
AmAcid   Codon     Number        /1000     Fraction   ..
Ala      GCG         0.00         0.00         0.00
Ala      GCA         0.00         0.00         0.00
Ala      GCT         0.00         0.00         0.00
Ala      GCC         1.00         7.30         1.00

Cys        TGT             2.00   14.60       0.20
Cys        TGC             8.00   58.39       0.80

Asp        GAT             4.00   29.20       0.67
Asp        GAC             2.00   14.60       0.33

Glu        GAG             1.00    7.30       0.50
Glu        GAA             1.00    7.30       0.50

Phe        TTT            12.00   87.59       0.50
Phe        TTC            12.00   87.59       0.50

Gly        GGG             0.00    0.00       0.00
Gly        GGA             0.00    0.00       0.00
Gly        GGT             1.00    7.30       0.50
Gly        GGC             1.00    7.30       0.50

His        CAT             6.00   43.80       0.86
His        CAC             1.00    7.30       0.14

Ile        ATA             0.00    0.00       0.00
Ile        ATT             4.00   29.20       0.40

Ile        ATC             6.00   43.80       0.60

Lys        AAG             0.00    0.00       0.00
Lys        AAA             2.00   14.60       1.00

Leu        TTG             0.00    0.00       0.00
Leu        TTA             0.00    0.00       0.00
Leu        CTG             0.00    0.00       0.00
Leu        CTA             0.00    0.00       0.00
Leu        CTT             2.00   14.60       0.67
Leu        CTC             1.00    7.30       0.33

Met        ATG             1.00    7.30       1.00
Asn   AAT   5.00   36.50   0.71
Asn   AAC   2.00   14.60   0.29

Pro   CCG   0.00    0.00   0.00
Pro   CCA   1.00    7.30   0.50
Pro   CCT   1.00    7.30   0.50
Pro   CCC   0.00    0.00   0.00

Gln   CAG   0.00    0.00   0.00
Gln   CAA   2.00   14.60   1.00

Arg   AGG   3.00   21.90   0.50
Arg   AGA   0.00    0.00   0.00
Arg   CGG   1.00    7.30   0.17
Arg   CGA   1.00    7.30   0.17
Arg   CGT   1.00    7.30   0.17
Arg   CGC   0.00    0.00   0.00

Ser   AGT   2.00   14.60   0.17
Ser   AGC   2.00   14.60   0.17
Ser   TCG   0.00    0.00   0.00
Ser   TCA   0.00    0.00   0.00
Ser   TCT   4.00   29.20   0.33
Ser   TCC   4.00   29.20   0.33

Thr   ACG   0.00    0.00   0.00
Thr   ACA   3.00   21.90   0.30
Thr   ACT   1.00    7.30   0.10
Thr   ACC   6.00   43.80   0.60

Val   GTG   1.00    7.30   0.10
Val   GTA   2.00   14.60   0.20
Val   GTT   3.00   21.90   0.30
Val   GTC   4.00   29.20   0.40

Trp   TGG   2.00   14.60   1.00

Tyr   TAT   5.00   36.50   0.45
Tyr   TAC   6.00   43.80   0.55

End   TGA   0.00    0.00   0.00
End   TAG   0.00    0.00   0.00
End   TAA   7.00   51.09   1.00
Results for 594 residue sequence   "Staphylococcus carnosus”
Carbonic Anhydrase of 594 bases"
AmAcid   Codon     Number          /1000     Fraction    ...
Ala      GCG         0.00           0.00         0.00
Ala      GCA         7.00          35.90         0.58
Ala      GCT         5.00          25.64         0.42
Ala      GCC         0.00           0.00         0.00

Cys     TGT         2.00           10.26        0.67
Cys     TGC         1.00            5.13        0.33

Asp     GAT        12.00           61.54        0.75
Asp     GAC         4.00           20.51        0.25

Glu     GAG         0.00            0.00        0.00
Glu     GAA        13.00           66.67        1.00

Phe     TTT         7.00           35.90         0.88
Phe     TTC         1.00            5.13         0.13

Gly     GGG         1.00            5.13        0.06
Gly     GGA         1.00            5.13        0.06
Gly     GGT        10.00           51.28        0.63
Gly     GGC         4.00           20.51        0.25

His     CAT         4.00           20.51        0.67
His     CAC         2.00           10.26        0.33

Ile     ATA         1.00            5.13        0.07
Ile     ATT         8.00           41.03        0.57
Ile     ATC         5.00           25.64        0.36

Lys     AAG         0.00            0.00        0.00
Lys     AAA        17.00           87.18        1.00

Leu     TTG         2.00           10.26        0.11
Leu     TTA        13.00           66.67        0.72
Leu     CTG         0.00            0.00        0.00
Leu     CTA         1.00            5.13        0.06
Leu     CTT         1.00            5.13        0.06
Leu     CTC         1.00            5.13        0.06

Met     ATG         6.00           30.77        1.00

Asn     AAT         8.00           41.03        0.80
Asn     AAC         2.00           10.26        0.20
Pro   CCG   0.00    0.00   0.00
Pro   CCA   2.00   10.26   0.33
Pro   CCT   2.00   10.26   0.33
Pro   CCC   2.00   10.26   0.33

Gln   CAG   0.00    0.00   0.00
Gln   CAA   4.00   20.51   1.00

Arg   AGG   0.00    0.00   0.00
Arg   AGA   1.00    5.13   0.25
Arg   CGG   0.00    0.00   0.00
Arg   CGA   0.00    0.00   0.00
Arg   CGT   2.00   10.26   0.50
Arg   CGC   1.00    5.13   0.25

Ser   AGT   1.00    5.13   0.13
Ser   AGC   4.00   20.51   0.50
Ser   TCG   0.00    0.00   0.00
Ser   TCA   1.00    5.13   0.13
Ser   TCT   1.00    5.13   0.13
Ser   TCC   1.00    5.13   0.13

Thr   ACG   3.00   15.38   0.25
Thr   ACA   7.00   35.90   0.58
Thr   ACT   2.00   10.26   0.17
Thr   ACC   0.00    0.00   0.00

Val   GTG   1.00    5.13   0.07
Val   GTA   6.00   30.77   0.43
Val   GTT   3.00   15.38   0.21
Val   GTC   4.00   20.51   0.29

Trp   TGG   0.00   0.00    0.00

Tyr   TAT   6.00   30.77   0.86
Tyr   TAC   1.00    5.13   0.14

End   TGA   0.00   0.00    0.00
End   TAG   0.00   0.00    0.00
End   TAA   1.00   5.13    1.00
Results for 660 residue sequence "Vibrio cholerae Carbonic
Anhydrase of 720 bases

AmAcid   Codon    Number        /1000     Fraction   ..
Ala      GCG        7.00        31.82         0.44
Ala      GCA        2.00         9.09         0.13
Ala      GCT        3.00        13.64         0.19
Ala      GCC        4.00        18.18         0.25

Cys      TGT        0.00         0.00         0.00
Cys      TGC        2.00         9.09         1.00

Asp      GAT        5.00        22.73         0.71
Asp      GAC        2.00         9.09         0.29

Glu      GAG        7.00        31.82         0.50
Glu      GAA        7.00        31.82         0.50

Phe      TTT        4.00        18.18         0.57
Phe      TTC        3.00        13.64         0.43

Gly      GGG        7.00        31.82         0.41
Gly      GGA        3.00        13.64         0.18
Gly      GGT        4.00        18.18         0.24
Gly      GGC        3.00        13.64         0.18

His      CAT        8.00        36.36         0.73
His      CAC        3.00        13.64         0.27

Ile      ATA        1.00         4.55         0.17
Ile      ATT        2.00         9.09         0.33
Ile      ATC        3.00        13.64         0.50

Lys      AAG        3.00        13.64         0.30
Lys      AAA        7.00        31.82         0.70

Leu      TTG        5.00        22.73         0.22
Leu      TTA        2.00         9.09         0.09
Leu      CTG        5.00        22.73         0.22
Leu      CTA        2.00         9.09         0.09
Leu      CTT        5.00        22.73         0.22
Leu      CTC        4.00        18.18         0.17

Met      ATG        3.00        13.64         1.00

Asn      AAT       13.00        59.09         0.87
Asn      AAC        2.00         9.09         0.13
Pro   CCG    6.00   27.27   0.38
Pro   CCA    4.00   18.18   0.25
Pro   CCT    5.00   22.73   0.31
Pro   CCC    1.00    4.55   0.06

Gln   CAG    7.00   31.82   0.35
Gln   CAA   13.00   59.09   0.65

Arg   AGG    0.00    0.00   0.00
Arg   AGA    0.00    0.00   0.00
Arg   CGG    0.00    0.00   0.00
Arg   CGA    1.00    4.55   0.20
Arg   CGT    4.00   18.18   0.80
Arg   CGC    0.00    0.00   0.00

Ser   AGT   2.00     9.09   0.18
Ser   AGC   2.00     9.09   0.18
Ser   TCG   4.00    18.18   0.36
Ser   TCA   1.00     4.55   0.09
Ser   TCT   1.00     4.55   0.09
Ser   TCC   1.00     4.55   0.09

Thr   ACG    5.00   22.73   0.50
Thr   ACA    0.00    0.00   0.00
Thr   ACT    3.00   13.64   0.30
Thr   ACC    2.00    9.09   0.20

Val   GTG    5.00   22.73   0.29
Val   GTA    3.00   13.64   0.18
Val   GTT    6.00   27.27   0.35
Val   GTC    3.00   13.64   0.18

Trp   TGG    4.00   18.18   1.00

Tyr   TAT    3.00   13.64   0.60
Tyr   TAC    2.00    9.09   0.40

End   TGA    0.00    0.00   0.00
End   TAG    0.00    0.00   0.00
End   TAA    1.00    4.55   1.00
Results for 372 residue sequence "Eschereshia coli Carbonic
Anhydrase of 372 bases"


AmAcid   Codon    Number        /1000     Fraction   ..
Ala      GCG        4.00        32.26         0.31
Ala      GCA        1.00         8.06         0.08
Ala      GCT        1.00         8.06         0.08
Ala      GCC        7.00        56.45         0.54

Cys      TGT        1.00         8.06         0.50
Cys      TGC        1.00         8.06         0.50

Asp      GAT        4.00        32.26         0.57

Asp      GAC        3.00        24.19         0.43

Glu      GAG        2.00        16.13         0.67
Glu      GAA        1.00         8.06         0.33

Phe      TTT        5.00        40.32         0.45
Phe      TTC        6.00        48.39         0.55

Gly      GGG        0.00         0.00         0.00
Gly      GGA        2.00        16.13         0.25
Gly      GGT        3.00        24.19         0.38
Gly      GGC        3.00        24.19         0.38

His      CAT        3.00        24.19         0.75
His      CAC        1.00         8.06         0.25

Ile      ATA        1.00         8.06         0.20
Ile      ATT        0.00         0.00         0.00
Ile      ATC        4.00        32.26         0.80

Lys      AAG        2.00        16.13         1.00
Lys      AAA        0.00         0.00         0.00

Leu      TTG        4.00        32.26         0.40
Leu      TTA        1.00         8.06         0.10
Leu      CTG        2.00        16.13         0.20
Leu      CTA        0.00         0.00         0.00
Leu      CTT        1.00         8.06         0.10
Leu      CTC        2.00        16.13         0.20

Met      ATG        2.00        16.13         1.00
Asn   AAT   3.00   24.19   0.75
Asn   AAC   1.00    8.06   0.25

Pro   CCG   0.00    0.00   0.00
Pro   CCA   4.00   32.26   0.57
Pro   CCT   0.00    0.00   0.00
Pro   CCC   3.00   24.19   0.43

Gln   CAG   9.00   72.58   0.75
Gln   CAA   3.00   24.19   0.25

Arg   AGG   0.00    0.00   0.00
Arg   AGA   0.00    0.00   0.00
Arg   CGG   2.00   16.13   0.50
Arg   CGA   0.00    0.00   0.00
Arg   CGT   0.00    0.00   0.00
Arg   CGC   2.00   16.13   0.50

Ser   AGT   2.00   16.13   1.00
Ser   AGC   0.00    0.00   0.00
Ser   TCG   0.00    0.00   0.00
Ser   TCA   0.00    0.00   0.00
Ser   TCT   0.00    0.00   0.00
Ser   TCC   0.00    0.00   0.00

Thr   ACG   4.00   32.26   0.67
Thr   ACA   1.00    8.06   0.17
Thr   ACT   0.00    0.00   0.00
Thr   ACC   1.00    8.06   0.17

Val   GTG   7.00   56.45   0.37
Val   GTA   3.00   24.19   0.16
Val   GTT   8.00   64.52   0.42
Val   GTC   1.00    8.06   0.05

Trp   TGG   0.00   0.00    0.00

Tyr   TAT   0.00    0.00   0.00
Tyr   TAC   1.00    8.06   1.00

End   TGA   2.00   16.13   1.00
End   TAG   0.00    0.00   0.00
End   TAA   0.00    0.00   0.00
Results for 660 residue sequence "456 Ecoli Carbonic anhydrase
Final 2"

AmAcid   Codon    Number        /1000     Fraction   ..
Ala      GCG        9.00        40.91         0.30
Ala      GCA        4.00        18.18         0.13
Ala      GCT        7.00        31.82         0.23
Ala      GCC       10.00        45.45         0.33

Cys      TGT        4.00        18.18         0.67
Cys      TGC        2.00         9.09         0.33

Asp      GAT        4.00        18.18         0.44
Asp      GAC        5.00        22.73         0.56

Glu      GAG        7.00        31.82         0.58
Glu      GAA        5.00        22.73         0.42

Phe      TTT        5.00        22.73         0.63
Phe      TTC        3.00        13.64         0.38

Gly      GGG        2.00         9.09         0.17
Gly      GGA        1.00         4.55         0.08
Gly      GGT        2.00         9.09         0.17
Gly      GGC        7.00        31.82         0.58

His      CAT        4.00        18.18         0.67
His      CAC        2.00         9.09         0.33

Ile      ATA        1.00         4.55         0.08
Ile      ATT        8.00        36.36         0.62
Ile      ATC        4.00        18.18         0.31

Lys      AAG        1.00         4.55         0.20
Lys      AAA        4.00        18.18         0.80

Leu      TTG        3.00        13.64         0.18
Leu      TTA        1.00         4.55         0.06
Leu      CTG        8.00        36.36         0.47
Leu      CTA        1.00         4.55         0.06
Leu      CTT        3.00        13.64         0.18
Leu      CTC        1.00         4.55         0.06

Met      ATG        4.00        18.18         1.00
Asn   AAT   4.00   18.18   0.57
Asn   AAC   3.00   13.64   0.43

Pro   CCG   7.00   31.82   0.47
Pro   CCA   2.00    9.09   0.13
Pro   CCT   5.00   22.73   0.33
Pro   CCC   1.00    4.55   0.07

Gln   CAG   6.00   27.27   0.60
Gln   CAA   4.00   18.18   0.40

Arg   AGG   0.00    0.00   0.00
Arg   AGA   0.00    0.00   0.00
Arg   CGG   3.00   13.64   0.19
Arg   CGA   0.00    0.00   0.00
Arg   CGT   4.00   18.18   0.25
Arg   CGC   9.00   40.91   0.56

Ser   AGT   0.00    0.00   0.00
Ser   AGC   5.00   22.73   0.29
Ser   TCG   2.00    9.09   0.12
Ser   TCA   2.00    9.09   0.12
Ser   TCT   2.00    9.09   0.12
Ser   TCC   6.00   27.27   0.35

Thr   ACG   1.00    4.55   0.14
Thr   ACA   2.00    9.09   0.29
Thr   ACT   1.00    4.55   0.14
Thr   ACC   3.00   13.64   0.43

Val   GTG   6.00   27.27   0.32
Val   GTA   2.00    9.09   0.11
Val   GTT   4.00   18.18   0.21
Val   GTC   7.00   31.82   0.37

Trp   TGG   2.00    9.09   1.00

Tyr   TAT   2.00    9.09   0.50
Tyr   TAC   2.00    9.09   0.50

End   TGA   0.00    0.00   0.00
End   TAG   0.00    0.00   0.00
End   TAA   1.00    4.55   1.00
Results for 663 residue sequence "123 Ecoli carbonic Anhydrase
Final"

AmAcid   Codon    Number        /1000     Fraction   ..
Ala      GCG        6.00        27.15         0.32
Ala      GCA        2.00         9.05         0.11
Ala      GCT        2.00         9.05         0.11
Ala      GCC        9.00        40.72         0.47

Cys      TGT        1.00         4.52         0.17
Cys      TGC        5.00        22.62         0.83

Asp      GAT        5.00        22.62         0.71
Asp      GAC        2.00         9.05         0.29

Glu      GAG        5.00        22.62         0.83
Glu      GAA        1.00         4.52         0.17

Phe      TTT        9.00        40.72         0.45
Phe      TTC       11.00        49.77         0.55

Gly      GGG        1.00         4.52         0.06
Gly      GGA        4.00        18.10         0.25
Gly      GGT        7.00        31.67         0.44
Gly      GGC        4.00        18.10         0.25

His      CAT        5.00        22.62         0.56
His      CAC        4.00        18.10         0.44

Ile      ATA        2.00         9.05         0.18
Ile      ATT        2.00         9.05         0.18
Ile      ATC        7.00        31.67         0.64

Lys      AAG        4.00        18.10         0.50
Lys      AAA        4.00        18.10         0.50

Leu      TTG        6.00        27.15         0.38
Leu      TTA        1.00         4.52         0.06
Leu      CTG        3.00        13.57         0.19
Leu      CTA        0.00         0.00         0.00
Leu      CTT        1.00         4.52         0.06
Leu      CTC        5.00        22.62         0.31

Met      ATG        2.00         9.05         1.00
Asn   AAT    8.00   36.20   0.50
Asn   AAC    8.00   36.20   0.50

Pro   CCG    0.00    0.00   0.00
Pro   CCA    6.00   27.15   0.60
Pro   CCT    0.00    0.00   0.00
Pro   CCC    4.00   18.10   0.40

Gln   CAG   13.00   58.82   0.81
Gln   CAA    3.00   13.57   0.19

Arg   AGG   1.00     4.52   0.14
Arg   AGA   0.00     0.00   0.00
Arg   CGG   4.00    18.10   0.57
Arg   CGA   0.00     0.00   0.00
Arg   CGT   0.00     0.00   0.00
Arg   CGC   2.00     9.05   0.29

Ser   AGT    1.00    4.52   0.33
Ser   AGC    1.00    4.52   0.33
Ser   TCG    0.00    0.00   0.00
Ser   TCA    0.00    0.00   0.00
Ser   TCT    0.00    0.00   0.00
Ser   TCC    1.00    4.52   0.33

Thr   ACG    6.00   27.15   0.50
Thr   ACA    3.00   13.57   0.25
Thr   ACT    1.00    4.52   0.08
Thr   ACC    2.00    9.05   0.17

Val   GTG   10.00   45.25   0.36
Val   GTA    3.00   13.57   0.11
Val   GTT   11.00   49.77   0.39
Val   GTC    4.00   18.10   0.14

Trp   TGG    0.00    0.00   0.00

Tyr   TAT    1.00    4.52   0.33
Tyr   TAC    2.00    9.05   0.67

End   TGA    3.00   13.57   0.50
End   TAG    2.00    9.05   0.33
End   TAA    1.00    4.52   0.17
Results for 675 residue sequence "Truepera radiovictrix DSM1703
Carbo Anhyd consisting of 675 bases"
AmAcid   Codon     Number        /1000     Fraction   ..
Ala      GCG        12.00        53.33         0.41
Ala      GCA         3.00        13.33         0.10
Ala      GCT         2.00         8.89         0.07
Ala      GCC        12.00        53.33         0.41

Cys     TGT         1.00         4.44         0.50
Cys     TGC         1.00         4.44         0.50

Asp     GAT         6.00        26.67         0.46
Asp     GAC         7.00        31.11         0.54

Glu     GAG        15.00         66.67        0.88
Glu     GAA         2.00          8.89        0.12

Phe     TTT         0.00         0.00         0.00
Phe     TTC         1.00         4.44         1.00

Gly     GGG        11.00        48.89         0.33
Gly     GGA         2.00         8.89         0.06
Gly     GGT         5.00        22.22         0.15
Gly     GGC        15.00        66.67         0.45

His     CAT         2.00          8.89        0.18
His     CAC         9.00         40.00        0.82

Ile     ATA         0.00         0.00         0.00
Ile     ATT         0.00         0.00         0.00
Ile     ATC         0.00         0.00         0.00

Lys     AAG         2.00         8.89         0.67
Lys     AAA         1.00         4.44         0.33

Leu     TTG         2.00         8.89         0.10
Leu     TTA         0.00         0.00         0.00
Leu     CTG         6.00        26.67         0.29
Leu     CTA         0.00         0.00         0.00
Leu     CTT         2.00         8.89         0.10
Leu     CTC        11.00        48.89         0.52

Met     ATG         0.00         0.00         0.00

Asn     AAT         1.00         4.44         0.50
Asn   AAC    1.00    4.44   0.50

Pro   CCG    4.00   17.78   0.25
Pro   CCA    1.00    4.44   0.06
Pro   CCT    1.00    4.44   0.06
Pro   CCC   10.00   44.44   0.63

Gln   CAG   6.00    26.67   0.50
Gln   CAA   6.00    26.67   0.50

Arg   AGG    0.00    0.00   0.00
Arg   AGA    0.00    0.00   0.00
Arg   CGG    7.00   31.11   0.21
Arg   CGA    3.00   13.33   0.09
Arg   CGT    8.00   35.56   0.24
Arg   CGC   15.00   66.67   0.45

Ser   AGT    0.00    0.00   0.00
Ser   AGC    4.00   17.78   0.80
Ser   TCG    0.00    0.00   0.00
Ser   TCA    1.00    4.44   0.20
Ser   TCT    0.00    0.00   0.00
Ser   TCC    0.00    0.00   0.00

Thr   ACG    1.00    4.44   0.50
Thr   ACA    1.00    4.44   0.50
Thr   ACT    0.00    0.00   0.00
Thr   ACC    0.00    0.00   0.00

Val   GTG    7.00   31.11   0.32
Val   GTA    3.00   13.33   0.14
Val   GTT    3.00   13.33   0.14
Val   GTC    9.00   40.00   0.41

Trp   TGG    0.00    0.00   0.00

Tyr   TAT    0.00    0.00   0.00
Tyr   TAC    0.00    0.00   0.00

End   TGA    0.00    0.00   0.00
End   TAG    1.00    4.44   0.33
End   TAA    2.00    8.89   0.67
Results for 765 residue sequence " Salinispora arenicola "

AmAcid   Codon    Number        /1000     Fraction   ..
Ala      GCG       17.00        68.55         0.47
Ala      GCA        3.00        12.10         0.08
Ala      GCT        4.00        16.13         0.11
Ala      GCC       12.00        48.39         0.33

Cys      TGT        3.00        12.10         0.75
Cys      TGC        1.00         4.03         0.25

Asp      GAT        1.00         4.03         0.08
Asp      GAC       12.00        48.39         0.92

Glu      GAG       11.00        44.35         1.00
Glu      GAA        0.00         0.00         0.00

Phe      TTT        1.00         4.03         0.25
Phe      TTC        3.00        12.10         0.75

Gly      GGG        8.00        32.26         0.26
Gly      GGA        6.00        24.19         0.19
Gly      GGT        7.00        28.23         0.23
Gly      GGC       10.00        40.32         0.32

His      CAT        1.00         4.03         0.13
His      CAC        7.00        28.23         0.88

Ile      ATA        0.00         0.00         0.00
Ile      ATT        1.00         4.03         0.10
Ile      ATC        9.00        36.29         0.90

Lys      AAG        0.00         0.00         0.00
Lys      AAA        0.00         0.00         0.00

Leu      TTG        1.00         4.03         0.07
Leu      TTA        0.00         0.00         0.00
Leu      CTG        5.00        20.16         0.36
Leu      CTA        0.00         0.00         0.00
Leu      CTT        3.00        12.10         0.21
Leu      CTC        5.00        20.16         0.36

Met      ATG        2.00         8.06         1.00

Asn      AAT        0.00         0.00         0.00
Asn      AAC        2.00         8.06         1.00
Pro   CCG   10.00   40.32   0.53
Pro   CCA    4.00   16.13   0.21
Pro   CCT    1.00    4.03   0.05
Pro   CCC    4.00   16.13   0.21

Gln   CAG    8.00   32.26   1.00
Gln   CAA    0.00    0.00   0.00

Arg   AGG    0.00    0.00   0.00
Arg   AGA    0.00    0.00   0.00
Arg   CGG    7.00   28.23   0.39
Arg   CGA    2.00    8.06   0.11
Arg   CGT    5.00   20.16   0.28
Arg   CGC    4.00   16.13   0.22

Ser   AGT    1.00    4.03   0.07
Ser   AGC    4.00   16.13   0.27
Ser   TCG    2.00    8.06   0.13
Ser   TCA    0.00    0.00   0.00
Ser   TCT    1.00    4.03   0.07
Ser   TCC    7.00   28.23   0.47

Thr   ACG    3.00   12.10   0.20
Thr   ACA    2.00    8.06   0.13
Thr   ACT    0.00    0.00   0.00
Thr   ACC   10.00   40.32   0.67

Val   GTG   18.00   72.58   0.53
Val   GTA    2.00    8.06   0.06
Val   GTT    6.00   24.19   0.18
Val   GTC    8.00   32.26   0.24

Trp   TGG    0.00   0.00    0.00

Tyr   TAT    0.00    0.00   0.00
Tyr   TAC    3.00   12.10   1.00

End   TGA    0.00   0.00    0.00
End   TAG    0.00   0.00    0.00
End   TAA    1.00   4.03    1.00
Results for 774 residue sequence " abc Salinispora " starting

AmAcid   Codon     Number       /1000     Fraction   ..
Ala      GCG        13.00       52.85         0.37
Ala      GCA         5.00       20.33         0.14
Ala      GCT         2.00        8.13         0.06
Ala      GCC        15.00       60.98         0.43

Cys      TGT        1.00         4.07         0.33
Cys      TGC        2.00         8.13         0.67

Asp      GAT        3.00        12.20         0.30
Asp      GAC        7.00        28.46         0.70

Glu      GAG        6.00        24.39         0.55
Glu      GAA        5.00        20.33         0.45

Phe      TTT        2.00          8.13        0.40
Phe      TTC        3.00         12.20        0.60

Gly      GGG        5.00        20.33         0.23
Gly      GGA        3.00        12.20         0.14
Gly      GGT        4.00        16.26         0.18
Gly      GGC       10.00        40.65         0.45

His      CAT        1.00         4.07         0.33
His      CAC        2.00         8.13         0.67

Ile      ATA        0.00          0.00        0.00
Ile      ATT        1.00          4.07        0.17
Ile      ATC        5.00         20.33        0.83

Lys      AAG        5.00         20.33        1.00
Lys      AAA        0.00          0.00        0.00

Leu      TTG        0.00         0.00         0.00
Leu      TTA        0.00         0.00         0.00
Leu      CTG        8.00        32.52         0.32
Leu      CTA        2.00         8.13         0.08
Leu      CTT        7.00        28.46         0.28
Leu      CTC        8.00        32.52         0.32



Met      ATG        3.00        12.20         1.00
Asn   AAT    1.00    4.07   0.17
Asn   AAC    5.00   20.33   0.83

Pro   CCG    9.00   36.59   0.45
Pro   CCA    3.00   12.20   0.15
Pro   CCT    2.00    8.13   0.10
Pro   CCC    6.00   24.39   0.30

Gln   CAG    6.00   24.39   0.67
Gln   CAA    3.00   12.20   0.33

Arg   AGG    1.00    4.07   0.06
Arg   AGA    2.00    8.13   0.11
Arg   CGG    7.00   28.46   0.39
Arg   CGA    1.00    4.07   0.06
Arg   CGT    5.00   20.33   0.28
Arg   CGC    2.00    8.13   0.11

Ser   AGT   4.00    16.26   0.20
Ser   AGC   2.00     8.13   0.10
Ser   TCG   6.00    24.39   0.30
Ser   TCA   2.00     8.13   0.10
Ser   TCT   1.00     4.07   0.05
Ser   TCC   5.00    20.33   0.25

Thr   ACG    4.00   16.26   0.27
Thr   ACA    2.00    8.13   0.13
Thr   ACT    2.00    8.13   0.13
Thr   ACC    7.00   28.46   0.47

Val   GTG   13.00   52.85   0.57
Val   GTA    1.00    4.07   0.04
Val   GTT    1.00    4.07   0.04
Val   GTC    8.00   32.52   0.35

Trp   TGG    2.00   8.13    1.00

Tyr   TAT    2.00   8.13    0.50
Tyr   TAC    2.00   8.13    0.50

End   TGA    1.00   4.07    1.00
End   TAG    0.00   0.00    0.00
End   TAA    0.00   0.00    0.00
Results for 488 residue sequence "Frankia CcI Carbonic Anhydrase
1 of 488 bases"



AmAcid   Codon    Number        /1000     Fraction   ..
Ala      GCG        7.00        43.21         0.39
Ala      GCA        4.00        24.69         0.22
Ala      GCT        2.00        12.35         0.11
Ala      GCC        5.00        30.86         0.28

Cys      TGT        1.00         6.17         0.20
Cys      TGC        4.00        24.69         0.80

Asp      GAT        0.00         0.00         0.00
Asp      GAC        1.00         6.17         1.00

Glu      GAG        0.00         0.00         0.00
Glu      GAA        0.00         0.00         0.00

Phe      TTT        0.00         0.00         0.00
Phe      TTC        1.00         6.17         1.00

Gly      GGG        1.00         6.17         0.20
Gly      GGA        2.00        12.35         0.40
Gly      GGT        0.00         0.00         0.00
Gly      GGC        2.00        12.35         0.40

His      CAT        0.00         0.00         0.00
His      CAC        1.00         6.17         1.00

Ile      ATA        1.00         6.17         0.50
Ile      ATT        1.00         6.17         0.50
Ile      ATC        0.00         0.00         0.00

Lys      AAG        2.00        12.35         1.00
Lys      AAA        0.00         0.00         0.00

Leu      TTG        3.00        18.52         0.50
Leu      TTA        2.00        12.35         0.33
Leu      CTG        0.00         0.00         0.00
Leu      CTA        0.00         0.00         0.00
Leu      CTT        0.00         0.00         0.00
Leu      CTC        1.00         6.17         0.17

Met      ATG        1.00         6.17         1.00
Asn   AAT    1.00     6.17   0.33
Asn   AAC    2.00    12.35   0.67

Pro   CCG   17.00   104.94   0.63
Pro   CCA    4.00    24.69   0.15
Pro   CCT    4.00    24.69   0.15
Pro   CCC    2.00    12.35   0.07

Gln   CAG    1.00    6.17    1.00
Gln   CAA    0.00    0.00    0.00

Arg   AGG    3.00   18.52    0.14
Arg   AGA    4.00   24.69    0.18
Arg   CGG    0.00    0.00    0.00
Arg   CGA    6.00   37.04    0.27
Arg   CGT    4.00   24.69    0.18
Arg   CGC    5.00   30.86    0.23

Ser   AGT    2.00   12.35    0.06
Ser   AGC    2.00   12.35    0.06
Ser   TCG    8.00   49.38    0.24
Ser   TCA   12.00   74.07    0.35
Ser   TCT    2.00   12.35    0.06
Ser   TCC    8.00   49.38    0.24

Thr   ACG   15.00   92.59    0.63
Thr   ACA    4.00   24.69    0.17
Thr   ACT    2.00   12.35    0.08
Thr   ACC    3.00   18.52    0.13

Val   GTG    0.00    0.00    0.00


Val   GTA   0.00     0.00    0.00
Val   GTT   1.00     6.17    1.00
Val   GTC   0.00     0.00    0.00

Trp   TGG    4.00   24.69    1.00

Tyr   TAT    0.00    0.00    0.00
Tyr   TAC    1.00    6.17    1.00

End   TGA    3.00    18.52   1.00
End   TAG    0.00     0.00   0.00
End   TAA    0.00     0.00   0.00
Results for 618 residue sequence "xyz Frankia CcI3 Carbonic
Anhydrase 2 of 618 bases"


67
AmAcid   Codon     Number       /1000     Fraction   ..
Ala      GCG         6.00       29.13         0.27
Ala      GCA         3.00       14.56         0.14
Ala      GCT         3.00       14.56         0.14
Ala      GCC        10.00       48.54         0.45

Cys      TGT        2.00         9.71         0.67
Cys      TGC        1.00         4.85         0.33

Asp      GAT        6.00        29.13         0.35
Asp      GAC       11.00        53.40         0.65

Glu      GAG        8.00        38.83         0.67
Glu      GAA        4.00        19.42         0.33

Phe      TTT        1.00         4.85         0.25
Phe      TTC        3.00        14.56         0.75

Gly      GGG        5.00        24.27         0.31
Gly      GGA        1.00         4.85         0.06
Gly      GGT        6.00        29.13         0.38
Gly      GGC        4.00        19.42         0.25

His      CAT        3.00        14.56         0.38
His      CAC        5.00        24.27         0.63

Ile      ATA        0.00         0.00         0.00
Ile      ATT        1.00         4.85         0.17
Ile      ATC        5.00        24.27         0.83

Lys      AAG        2.00         9.71         1.00
Lys      AAA        0.00         0.00         0.00

Leu      TTG        3.00        14.56         0.15
Leu      TTA        0.00         0.00         0.00
Leu      CTG       10.00        48.54         0.50
Leu      CTA        1.00         4.85         0.05
Leu      CTT        2.00         9.71         0.10
Leu      CTC        4.00        19.42         0.20

Met      ATG        1.00         4.85         1.00
Asn   AAT    0.00   0.00    0.00
Asn   AAC    1.00   4.85    1.00

Pro   CCG    5.00   24.27   0.42
Pro   CCA    0.00    0.00   0.00
Pro   CCT    1.00    4.85   0.08
Pro   CCC    6.00   29.13   0.50

Gln   CAG    5.00   24.27   1.00
Gln   CAA    0.00    0.00   0.00

Arg   AGG    2.00    9.71   0.13
Arg   AGA    0.00    0.00   0.00
Arg   CGG    6.00   29.13   0.38
Arg   CGA    0.00    0.00   0.00
Arg   CGT    2.00    9.71   0.13
Arg   CGC    6.00   29.13   0.38

Ser   AGT   0.00     0.00   0.00
Ser   AGC   4.00    19.42   0.36
Ser   TCG   4.00    19.42   0.36
Ser   TCA   0.00     0.00   0.00
Ser   TCT   0.00     0.00   0.00
Ser   TCC   3.00    14.56   0.27

Thr   ACG    5.00   24.27   0.33
Thr   ACA    0.00    0.00   0.00
Thr   ACT    0.00    0.00   0.00
Thr   ACC   10.00   48.54   0.67


Val   GTG   14.00   67.96   0.47
Val   GTA    1.00    4.85   0.03
Val   GTT    4.00   19.42   0.13
Val   GTC   11.00   53.40   0.37

Trp   TGG    1.00   4.85    1.00

Tyr   TAT    1.00   4.85    0.33
Tyr   TAC    2.00   9.71    0.67

End   TGA    0.00   0.00    0.00
End   TAG    1.00   4.85    1.00
End   TAA    0.00   0.00    0.00
From the above list, we conclude two things-
   1) The codon-plot of the different gene o.r.f.s from the same organism are the same
      except at some minor points.
   2) The codon-plot of the organisms only confirm our suspicion while analyzing the
      peptide sequences that choice of codons is different to suit the G-C content of
      the organism.
Corrections-
We undertake this because we noticed that gene products of Methanococcus voltae and
Frankia were not starting with amino-acid Methionine.

Methanococcus voltae corrections-

The mistake seems to be in the database from where sequence has been downloaded. The DNA seq.
had ‘ata’ instead of ‘atg’.




Frankia sp CcI3 corrections-

The mistake seems to have been in the sequence again. The DNA seq. began 27 bp before and the
claimed starting site of the protein actually coded for Valine.
Conclusion:
After studying the three analysis we did with the protein, DNA and the ORF codons,we
conclude the following-


   1) Bacteria choose codons based on its G-C composition to get same amino acid
      for creation of protein. G-C rich codon of course gets preference for G-C rich
      bacteria. Similarly and conversely, A-T rich codon gets preference for G-C poor
      bacteria.

   2) If same amino acid is not there, a synonymous amino acid is used having the
      same or near about same chemical properties.

   3) High G-C content bacteria often employ two different genes for same purpose.
      The finding of two possible genes in their genome for Carbonic Anhydrase is the
      proof for such a statement.

   4) Most bacteria use Zinc at the metal site yet a small number of bacteria use
      Cadmium and other metals.

   5) Even if they are of varied length, one may look for Serine and Glycine on the
      peptide chain and see that this region is conserved in all protein,. This is because
      the protein domains must be similar for all the anhydrases.
Use of bio-informatic tools in bacterial genetics

Más contenido relacionado

La actualidad más candente

Molecular Genetics
Molecular GeneticsMolecular Genetics
Molecular GeneticsJolie Yu
 
Use of Methylation Markers for Age Estimation of an unknown Individual based ...
Use of Methylation Markers for Age Estimation of an unknown Individual based ...Use of Methylation Markers for Age Estimation of an unknown Individual based ...
Use of Methylation Markers for Age Estimation of an unknown Individual based ...QIAGEN
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesGenome Reference Consortium
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteQIAGEN
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
 
4.4 genetic engineering & biotechnology
4.4 genetic engineering & biotechnology4.4 genetic engineering & biotechnology
4.4 genetic engineering & biotechnologycartlidge
 
suraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffolds
suraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffoldssuraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffolds
suraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffoldsSuraj Jaladanki
 
ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...
ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...
ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...Reed Woyda
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Ahmed Madni
 
20081216 05袁國芳 紅麴菌基因體計畫及基因研究
20081216 05袁國芳 紅麴菌基因體計畫及基因研究20081216 05袁國芳 紅麴菌基因體計畫及基因研究
20081216 05袁國芳 紅麴菌基因體計畫及基因研究Monascus2008
 
What I learned at CSHL SynBio 2013
What I learned at CSHL SynBio 2013What I learned at CSHL SynBio 2013
What I learned at CSHL SynBio 2013Kevin Spring
 
Applications of Single Cell Analysis
Applications of Single  Cell AnalysisApplications of Single  Cell Analysis
Applications of Single Cell AnalysisQIAGEN
 
Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...
Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...
Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...QIAGEN
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Thermo Fisher Scientific
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Thermo Fisher Scientific
 
Microbial biosensors structure and mechanism
Microbial biosensors structure and mechanism Microbial biosensors structure and mechanism
Microbial biosensors structure and mechanism Reza Khedmati
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 

La actualidad más candente (20)

Molecular Genetics
Molecular GeneticsMolecular Genetics
Molecular Genetics
 
Use of Methylation Markers for Age Estimation of an unknown Individual based ...
Use of Methylation Markers for Age Estimation of an unknown Individual based ...Use of Methylation Markers for Age Estimation of an unknown Individual based ...
Use of Methylation Markers for Age Estimation of an unknown Individual based ...
 
rli60 project FINAL
rli60 project FINALrli60 project FINAL
rli60 project FINAL
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
Microbiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro SuiteMicrobiome Profiling with the Microbial Genomics Pro Suite
Microbiome Profiling with the Microbial Genomics Pro Suite
 
RT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferationRT-PCR and DNA microarray measurement of mRNA cell proliferation
RT-PCR and DNA microarray measurement of mRNA cell proliferation
 
4.4 genetic engineering & biotechnology
4.4 genetic engineering & biotechnology4.4 genetic engineering & biotechnology
4.4 genetic engineering & biotechnology
 
suraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffolds
suraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffoldssuraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffolds
suraj_jaladanki_examining_Malaclemys_terrapin_genome_scaffolds
 
ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...
ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...
ReedWoyda_Introducing Green Fluorescence Into Homo sapiens And Escherichia Co...
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...
 
Molecular 50 questions
Molecular 50 questionsMolecular 50 questions
Molecular 50 questions
 
20081216 05袁國芳 紅麴菌基因體計畫及基因研究
20081216 05袁國芳 紅麴菌基因體計畫及基因研究20081216 05袁國芳 紅麴菌基因體計畫及基因研究
20081216 05袁國芳 紅麴菌基因體計畫及基因研究
 
What I learned at CSHL SynBio 2013
What I learned at CSHL SynBio 2013What I learned at CSHL SynBio 2013
What I learned at CSHL SynBio 2013
 
Applications of Single Cell Analysis
Applications of Single  Cell AnalysisApplications of Single  Cell Analysis
Applications of Single Cell Analysis
 
Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...
Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...
Overcome the challenges of Nucleic acid isolation from PCR inhibitor-rich mic...
 
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
Speeding up sequencing: Sequencing in an hour enables sample to answer in a w...
 
DNA
DNADNA
DNA
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
 
Microbial biosensors structure and mechanism
Microbial biosensors structure and mechanism Microbial biosensors structure and mechanism
Microbial biosensors structure and mechanism
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 

Similar a Use of bio-informatic tools in bacterial genetics

Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and totusha madan
 
Gen sequencing strategies by kk sahu
Gen sequencing strategies by kk sahuGen sequencing strategies by kk sahu
Gen sequencing strategies by kk sahuKAUSHAL SAHU
 
JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019Irina Silva
 
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...eSAT Journals
 
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...IRJET Journal
 
Waksman Student Scholars Program Poster
Waksman Student Scholars Program PosterWaksman Student Scholars Program Poster
Waksman Student Scholars Program PosterWesley Kwong
 
Appl Microbiol Biotechnol
Appl Microbiol Biotechnol Appl Microbiol Biotechnol
Appl Microbiol Biotechnol Charles Zhang
 
Protein Chromatography
Protein ChromatographyProtein Chromatography
Protein ChromatographyNicole Gomez
 
A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...
A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...
A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...Igor Putrenko
 
Lactic acid bacteria whole genome sequencing
Lactic acid bacteria whole genome sequencingLactic acid bacteria whole genome sequencing
Lactic acid bacteria whole genome sequencingDiwas Pradhan
 
Congreso de Biotecnología Arequipa Perú June 2011
Congreso de Biotecnología Arequipa Perú June 2011Congreso de Biotecnología Arequipa Perú June 2011
Congreso de Biotecnología Arequipa Perú June 2011Mills Cbst
 
Report on molecular biology techniques
Report on molecular biology techniquesReport on molecular biology techniques
Report on molecular biology techniquesraghavworah
 
Published Article in PPT.pptx
Published Article in PPT.pptxPublished Article in PPT.pptx
Published Article in PPT.pptxCEMB & online
 

Similar a Use of bio-informatic tools in bacterial genetics (20)

Isolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and toIsolation of rhizobium species from soil and to
Isolation of rhizobium species from soil and to
 
Gen sequencing strategies by kk sahu
Gen sequencing strategies by kk sahuGen sequencing strategies by kk sahu
Gen sequencing strategies by kk sahu
 
paper4arthrobacter.pdf
paper4arthrobacter.pdfpaper4arthrobacter.pdf
paper4arthrobacter.pdf
 
JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019JBEI Research Highlights - May 2019
JBEI Research Highlights - May 2019
 
CHEM3204_PRAC_Manual_2016
CHEM3204_PRAC_Manual_2016CHEM3204_PRAC_Manual_2016
CHEM3204_PRAC_Manual_2016
 
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
Microbial community analysis in anaerobic palm oil mill effluent (pome) waste...
 
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...IRJET-  	  Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
IRJET- Subcellular Localization of Transmembrane E-cadherin-GFP Fusion Pr...
 
Waksman Student Scholars Program Poster
Waksman Student Scholars Program PosterWaksman Student Scholars Program Poster
Waksman Student Scholars Program Poster
 
Appl Microbiol Biotechnol
Appl Microbiol Biotechnol Appl Microbiol Biotechnol
Appl Microbiol Biotechnol
 
Protein Chromatography
Protein ChromatographyProtein Chromatography
Protein Chromatography
 
A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...
A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...
A family of acetylcholine-gated chloride channel subunits in Caenorhabditis e...
 
IAJPR SIVA
IAJPR SIVAIAJPR SIVA
IAJPR SIVA
 
Lactic acid bacteria whole genome sequencing
Lactic acid bacteria whole genome sequencingLactic acid bacteria whole genome sequencing
Lactic acid bacteria whole genome sequencing
 
Congreso de Biotecnología Arequipa Perú June 2011
Congreso de Biotecnología Arequipa Perú June 2011Congreso de Biotecnología Arequipa Perú June 2011
Congreso de Biotecnología Arequipa Perú June 2011
 
Eisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009bEisen.Geba.Jgi2009b
Eisen.Geba.Jgi2009b
 
proteome.pdf
proteome.pdfproteome.pdf
proteome.pdf
 
Report on molecular biology techniques
Report on molecular biology techniquesReport on molecular biology techniques
Report on molecular biology techniques
 
poster
posterposter
poster
 
Published Article in PPT.pptx
Published Article in PPT.pptxPublished Article in PPT.pptx
Published Article in PPT.pptx
 
MGG2003-cDNA-AFLP
MGG2003-cDNA-AFLPMGG2003-cDNA-AFLP
MGG2003-cDNA-AFLP
 

Más de Debtanu Chakraborty

Más de Debtanu Chakraborty (7)

Zebra fish
Zebra fishZebra fish
Zebra fish
 
DLS Project report
DLS Project reportDLS Project report
DLS Project report
 
Bio-fuel Project report
Bio-fuel Project reportBio-fuel Project report
Bio-fuel Project report
 
Proline catalyzed aldol reaction
Proline catalyzed aldol reactionProline catalyzed aldol reaction
Proline catalyzed aldol reaction
 
Neuroplasticity and related concepts in Cognition
Neuroplasticity and related concepts in CognitionNeuroplasticity and related concepts in Cognition
Neuroplasticity and related concepts in Cognition
 
Cognition 1
Cognition 1Cognition 1
Cognition 1
 
Mate choice- a survey work.
Mate choice- a survey work.Mate choice- a survey work.
Mate choice- a survey work.
 

Último

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 

Último (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 

Use of bio-informatic tools in bacterial genetics

  • 1. USE OF BIO-INFORMATIC TOOLS TO STUDY IMPLICATIONS OF G-C CONTENT OF DNA ON THE PROTEIN. DEBTANU CHAKRABORTY
  • 2. Index 1) Note of Acknowledgement 2) Bio-informatics 3) G-C content 4) Classification tree of Bacteria 5) List of low G-C bacteria 6) List of high G-C bacteria 7) Introduction to Carbonic Anhydrase 8) Peptide Sequence and their analysis 9) Gene Sequences and their analysis 10) Codon usage plot 11) Conclusion 12) Future work-scope
  • 3. Note of Acknowledgement The project would have been incomplete without the help of a number of persons. First I would like to thank my mentor and guide Prof. Chanchal K. Das Gupta who gave me the idea and inspiration to do the project and helped me in every step whenever I was in trouble. I would like to thank Prof. Punyasloke Bhadury who helped me by introducing to NCBI website and showing me to perform tasks like alignment, BLAST in internet. I cannot repay the sin if I don’t mention the names of my superiors Papri di, Amit da and Shimonti di who also helped me with the project. I have in my work, extensively used the websites- NCBI and Uniprot.
  • 4. Bioinformatics is the application of statistics and computer science to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of informatic processes in biotic systems. Its primary use since at least the late 1980s has been in genomics and genetics, particularly in those areas of genomics involving large- scale DNA sequencing. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques and theory to solve formal and practical problems arising from the management and analysis of biological data. Over the past few decades rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce a tremendous amount of information related to molecular biology. It is the name given to these mathematical and computing approaches used to glean understanding of biological processes. Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures. The primary goal of bioinformatics is to increase the understanding of biological processes. What sets it apart from other approaches, however, is its focus on developing and applying computationally intensive techniques (e.g., pattern recognition, data mining, machine learning algorithms, and visualization) to achieve this goal. Major research efforts in the field include sequence alignment, gene finding, genome assembly, drug design, drug discovery, protein structure alignment, protein structure prediction, prediction of gene expression and protein- protein interactions, genome-wide association studies and the modeling of evolution.
  • 5. GC-content (or guanine-cytosine content), in molecular biology, is the percentage of nitrogenous bases on a molecule which are either guanine or cytosine (from a possibility of four different ones, also including adenine and thymine). This may refer to a specific fragment of DNA or RNA, or that of the whole genome. When it refers to a fragment of the genetic material, it may denote the GC-content of part of a gene (domain), single gene, group of genes (or gene clusters) or even a non-coding region. G (guanine) and C (cytosine) undergo a specific hydrogen bonding whereas A (adenine) bonds specifically with T (thymine). The GC pair is bound by three hydrogen bonds, while AT pairs are bound by two hydrogen bonds. DNA with high GC-content is more stable than DNA with low GC- content, but contrary to popular belief, the hydrogen bonds do not stabilize the DNA significantly and stabilization is mainly due to stacking interactions. In spite of the higher conferred to the genetic material, it is envisaged that cells with DNA with high GC-content undergo autolysis, thereby reducing the longevity of the cell per se. Due to the robustness endowed to the genetic materials in high GC organisms it was commonly believed that the GC content played a vital part in adaptation temperatures, a hypothesis which has recently been refuted. In PCR experiments, the GC-content of primers are used to predict their annealing temperature to the template DNA. A higher GC-content level indicates a higher melting temperature.
  • 6. THE EVOLUTION TREE IN BACTERIA WHERE IS G-C CONTENT STUDY IS AN ANALYTICAL TOOL. The guanine plus cytosine (GC) content in bacteria ranges from ~20% to 75% where as we will see in a later lecture that eukaryotic genomes have GC contents that often have a restricted range from ~35-50% (about 40%-45% in vertebrates).
  • 7. Some Bacteria with low G-C content -
  • 8. Some Bacteria with high G-C content-
  • 9. For our convenience, we chose Carbonic Anhydrase because it is present in all bacteria across the G-C content spectrum of Bacterias- The carbonic anhydrases (or carbonate dehydratases) form a family of enzymes that catalyze the rapid conversion of carbon dioxide and water to bicarbonate and protons, a reaction that occurs rather slowly in the absence of a catalyst.[1] The active site of most carbonic anhydrases contains a zinc ion; they are therefore classified as metalloenzymes. THE CARBONIC ANHYDRASE PROTEIN-
  • 10. In our analysis, we choose the following bacteria- 1) Methaococcus voltae A3 (UI-A8TF20) (G-Cc=27%) 2) Staphylococcus carnosus (UI-B9DMU8_STACT) (G-Cc=34%) 3) Vibrio cholera (UI-Q9KMP6_VIBCH) (G-Cc=47%) 4) Escherichia coli (UI-P61517) (G-Cc=50%%) 5) Truepera radiovictrix DSM1703 (UI-ADI14363) (G-Cc=68.2%) 6) Salinispora arenicola (UI-A8MOD8) (G-Cc=69.2%%) 7) Frankia CcI (UI-Q2JF50) (G-Cc=71%) *UI stands for the Uniprot Accession number of the Carbonic Anhydrase protein of the respective bacteria. We begin analyzing the protein Carbonic Anhydrase from these bacteria- The peptide sequence goes as follows- >Methanococcus voltae Carbonic Anhydrase Protein LN*LFNLASVNVNHKPFNFHIFRNCRVIFD*FDTFQHVFFFVIHFTHPSFKVWRKVWIYS SFNHFFSYLFNICSCHSTVGMTYDSYLFNI*TVYCNY*RP*YIVCNNITCVFDDFCVASF *THFFR*EIYESCIHTSYYC*FLFRFGFCSDSFTYTQ >Staphylococcus carnosus Carbonic Anhydrase Protein YPXXXMTLLESILAYNKDFVGNKEFENYTTSKKPDKKAVLFTCMDTRLQDLGTKALGFNN GDLKVVKNAGAIITHPYGSTIKSLLVGIYALGAEEIIIMAHKDCGMGCLDVSTVKDAMKE RGVTEETFKIIEHSGVDVDSFLQGFKDAEENVRRNIDMVYNHPLFDKSVPIHGLVIDPHT GELDLIQDGYELAAQNK* >Vibrio cholerae Carbonic Anhydrase Protein MKKTTWVLAMVASMSFGVQASEWGYEGEHAPEHWGKVAPLCAEGKNQSPIDVAQSVEADL QPFTLNYQGQVVGLLNNGHTLQAIVRGNNPLQIDGKTFQLKQFHFHTPSENLLKGKQFPL EAHFVHADEQGNLAVVAVMYQVGSENPLLKVLTADMPTKGNSTQLTQGIPLADWIPESKH YYRFNGSLTTPPCSEGVRWIVLKEPAHLSNQQEQQLSAVMGHNNRPVQPHNARLVLQAD* >1st Escherichia coli Carbonic Anhydrase Protein LFVVGVFQLEVGDPVTVTLLKGFAVSRCDIQITQQAVVNAVGPAVNGDFLPAFPR*LHNS GVAQVIHLFHDVQFTQGIQTALLRHFAEQ*AMFEPDIADMQQPVVDKPQFRVFNCGLYAA ATVV
  • 11. >2nd Escherichia coli Carbonic Anhydrase Final rip MKDIDTLISNNALWSKMLVEEDPGFFEKLAQAQKPRFLWIGCSDSRVPAERLTGLEPGEL FVHRNVANLVIHINNWLLHIRDIWFKHSSLLGEMPQERRLDTLCELNVMEQVYNLGHSTI MQSAWKRGQKVTIHGWAYGIHDGLLRDLDVTATNRETLEQRYRHGISNLKLKHANHK* >3rd Escherichia coli Carbonic Anhydrase Final rip2 VKEIIDGFLKFQREAFPKREALFKQLATQQSPRTLFISCSDSRLVPELVTQREPGDLFVI RNAGNIVPSYGPEPGGVSASVEYAVAALRVSDIVICGHSNCGAMTAIASCQCMDHMPAVS HWLRYADSARVVNEARPHSDLPSKAAAMVRENVIAQLANLQTHPSVRLALEEGRIALHGW VYDIESGSIAAFDGATRQFVPLAANPRVCAIPLRQPTAA* . >Truepera radiovictrix DSM1703 Carbonic Anhydrase Protein S*PFQKRAVSGRAG*KGCRQQLEPARLEVVHGADDGERALGDARL*GRVRGDEANGRLDV LPHGPLERTPRPPLSRVAATSGAPQSGLERPHEGRQRGVGAPLLEGCGGGRDRAAAGVPQ HHDERHAEHRDAVGEARQNRVVDDVAGDPVGKEVAQALVEDDLRRHARVGAAEHRREGVL LARQGRAPARVLVRVRHAPLEVALVPGQQALERPLGGQGRLGGGH >1 Salinispora arenicola Carbonic Anhydrase 1st Protein MNCPGTPDTQPGSHPVSSSGIGGSRSGPVGPEQALAELYDGNRRFAVGVPIRPHQDIDRR VALADGQQPFAVIVGCSDSRLAAEIIFDRGLGDLFVVRTAGHTVGPEVLGSVEYAVTVLG APLVVVLGHDSCGAVQAARTADATGAPASGHLRAVVDGVVPSVRRAGARGVTEIDQIVDI HIEQTVEAVLGRSEAVAAAVAGGRCAVVGMSYRLTAGEVHTVTAVGLAAPTTPPAAPETR PSAGPA* >2 Salinispora arenicola Carbonic Anhydrase 2nd Protein XXTXXESGRVAESESTAFRWAGGRCGRACGVFVDEGALVGDQRITDSVAHHAHRRIREAD GGQPAVGAWRPSTTQPGSSSASSRGPRTEALWGGPRMH*LAGAA*TPLRHRPG*SFRDTY GR*GDRPSGHWFCRVWTSDQWHSAHRGPWASALRRRQGGVHLPS*GQAAARQPTGDRYGP PAGV*TGSLSGERRPDRRHGPSPGRADRKRPALQPGTSPTRGEAGPCRGQCLLFPRYRRG GSPQWQTLL >1 Frankia CcI Carbonic Anhydrase 1st Protein CPSPTTT*PTTPPTRRPSPGRFRCRRPSTSPPSPAWTHGSTSTRSLAWATARLTSSATPA ASSPTTRSVPSRSASACSAPARSS*STTPTAAC*PSPTTILNARSRTRPGSNQNGPWSRL PTWPKTYASRLRGSRRARSSRIPTPSAASSSMLPPDCSPKSR
  • 12. >2 Frankia CcI3 Carbonic Anhydrase 2nd Protein VDTDDHTAVDPVADVHADDVHADTVRPADTVSPVSGAATATELLLSYAAGHPARRREAGL PALPGARPRLGVAVVACMDVRIQVEALLGLVEGDAHILRNAGGVITPDVVRSLAVSQHVL GTTEIILLHHTGCGLERITDDGFRDQLECKTGVRPEWAVYSFPDVEEDVRKSVRVLRSSP FLQSTTSVRGFVYQVETGALVEVLP* We have 3 protein sequences for E coli and 2 sequences each for Salinispora and Frankia. We now compare them amongst themselves. For E coli- The sequence marked Escherishsia is the 1st sequence. The sequence Ecoli is the 2nd sequence. The sequence Final is the 3rd sequence.
  • 13. For Salinispora- . : For Frankia- After viewing the alignment of the suspected Carbonic Anhydrase within the same species, we wish to align the proteins from all the sources, all proteins from same species is also incorporated.
  • 14. The alignment sequence of the bacteria is as follows-
  • 15. Analysis- we can see two things from the above. 1) Bacteria with high G-C have two genes for Carbonic Anhydrase and consequently 2 proteins suspected to be Carbonic Anhydrase. 2) Bacteria with high G-C incorporate synonymous amino acid which requires G-C rich codons to compensate in their protons. We will elaborate on the 2nd point later using Codon-plot. We can show that the corresponding codon of the DNA of Carbonic Anhydrase gene of this bacteria. Now we move to analyzing the DNA of the genes of Carbonic Anhydrase- The DNA sequences are as follows- >Methanococcus voltae Carbonic Anhydrase of 471 bases ttaaattaactttttaatctcgccagtgttaatgtcaatcataagcccttcaacttccac atctttaggaattgcagggtgatttttgattaattcgacacctttcaacacgttttcttc ttcgttatccattttacccatccaagcttcaaagtctggcgtaaagtatggatttactcc tcttttaatcatttcttttcttatctcttcaatatctgctcctgccattccacagtcggt atgacctacgatagctatcttttcaacatctaaacagtatattgcaactactaacgacct taatacatcgtctgtaataatattacctgcgtttttgatgacttttgcgtcgcctctttc taaacccattttttcaggtaagaaatttacgagtcttgtatccatacaagttattactgc taatttctttttcggtttggcttctgctccgatagtttcacctatactcaa >Staphylococcus carnosus Carbonic Anhydrase of 594 bases taccccancancanaatgacgttattagaaagcattttagcttataataaagattttgtc ggcaacaaagaatttgaaaactatacaacaagtaaaaaaccagataaaaaagcagtgtta tttacatgtatggatacacgtttgcaagatttaggtacaaaagcactcggttttaataat ggtgacttgaaagttgttaaaaatgcaggtgcaattatcacgcacccatatggttcaact ataaaaagcttactagtaggtatttatgcattaggtgctgaagaaattattattatggca cataaagattgcggaatgggttgtcttgatgtcagcactgttaaagacgcaatgaaagaa cgtggcgtaacagaagaaacatttaaaatcatcgaacattctggtgtagatgtagacagc tttttacaaggtttcaaagatgctgaagaaaatgtccgcagaaatatcgatatggtatat aatcatcccttatttgataaatccgtacctattcacggcttagtcatcgatcctcatacg ggggaattagatttaattcaagacggctatgaattagctgctcaaaataaataa
  • 16. >Vibrio cholerae Carbonic Anhydrase of 720 bases atgaaaaagacaacgtgggtattagcgatggtagccagtatgagcttcggcgtacaggct tccgagtgggggtatgaaggagagcatgctccggagcattggggcaaagttgcccctctt tgcgcagagggtaaaaatcaaagcccgattgatgtcgcgcaaagcgtagaagcggatcta cagcctttcacgctcaattatcaagggcaagtggttgggctgctcaataacgggcacact ttacaagcgatagtccgtggtaataacccactgcagatcgatggcaaaacgtttcagctt aagcagtttcattttcataccccttctgaaaatttgctaaaaggaaaacaattcccactg gaagcgcattttgttcatgccgacgagcaaggcaatctggcggttgttgcggtgatgtac caagtggggtcggaaaatccgctgcttaaggttctcacggcggatatgccgaccaaaggg aattcgactcagctcacgcaagggatccctttggctgattggatcccagaatcgaagcac tactatcgtttcaatggttcattgactacgccgccttgcagtgaaggtgtacgttggatt gtgttaaaagagccagcacatttgtcgaatcaacaagagcagcagcttagtgccgtgatg ggacacaataatcgacccgtacaaccgcataatgctcgtcttgtcttgcaagccgactaa >Escherichia coli Carbonic Anhydrase of 372 bases ttatttgtggttggcgtgtttcagcttgaggttggagatcccgtgacggtaacgttgctc aagggtttcgcggttagtcgctgtgacatccagatcacgcagcaagccgtcgtgaatgcc gtaggcccagccgtgaatggtgactttctgcccgcgtttccacgctgattgcataatagt ggagtggcccaggttatacacctgttccatgacgttcagttcacacaaggtatccagacg gcgctcttgcggcatttcgccgagcaatgagctatgtttgaaccagatatcgcggatatg cagcagccagttgttgataagccccagttccgggttttcaactgcggcttgtacgccgcc gcaaccgtagtg >123 Escherichia coli carbonic Anhydrase Final aagccccagttccgggttttcaactgcggcttgtacgccgccgcaaccgtagtggccaca gataataatgtgttcaacttcgagtacatccactgcatactgaaccacggaaaggcagtt caggtcagtttatttgtggttggcgtgtttcagcttgaggttggaaatcccgtgacggta acgttgctcaagggtttcgcggttggtggcggtaacatccagatcacgcagcaagccgtc gtgaatgccgtaggcccagccgtgaatggtaactttctgcccgcgtttccacgctgattg cataatggtggagtggcccaggttatacacctgttccatgacgttcagttcacacaaggt atccagacggcgctcttgcggcatttcgccgagcaatgagctatgtttgaaccagatatc gcggatatgcagcagccagttgttgatgtgaatgaccaggttagcaacattacggtgaac aaagagttcgcccggctcaagaccggttaaacgttctgcaggaacgcgactgtcggaaca tccaatccatagaaagcgcggtttttgcgcttgtgccagtttctcaaaaaacccgggatc ctcttccaccagcatttttgaccatagtgcattgttgctgatgagtgtatctatgtcttt cat >456 Escherichia coli Carbonic Anhydrase Final 2 gtgaaagagattattgatggattccttaaattccagcgcgaggcatttccgaagcgggaagcct tgtttaaacagctggcgacacagcaaagcccgcgcacactttttatctcctgctccgacagccg
  • 17. tctggtccctgagctggtgacgcaacgtgagcctggcgatctgttcgttattcgcaacgcgggc aatatcgtcccttcctacgggccggaacccggtggcgtttctgcttcggtggagtatgccgtcg ctgcgcttcgggtatctgacattgtgatttgtggtcattccaactgtggcgcgatgaccgccat tgccagctgtcagtgcatggaccatatgcctgccgtctcccactggctgcgttatgccgattca gcccgcgtcgttaatgaggcgcgcccgcattccgatttaccgtcaaaagctgcggcgatggtac gtgaaaacgtcattgctcagttggctaatttgcaaactcatccatcggtgcgcctggcgctcga agaggggcggatcgccctgcacggctgggtctacgacattgaaagcggcagcatcgcagctttt gacggcgcaacccgccagtttgtgccactggccgctaatcctcgcgtttgtgccataccgctac gccaaccgaccgcagcgtaa >Truepera radiovictrix DSM1703 Carbonic Anhydrase consisting of 675 bases tcataaccgttccaaaagcgggccgtgagcgggcgcgctgggtaaaaggggtgtcggcag cagctcgagcccgcccgtctcgaggtcgtacacggcgccgacgacggcgagcgtgcgctg ggcgatgcgcgcctttaggggcgggtgcgcggcgatgaggcgaacggacgcctcgacgtt ctccctcacggcccccttgagcgtacaccccgtccccccctcagccgtgtcgcagcgacg agcggcgcgccgcaaagcgggctcgagcgcccgcacgaggggcgtcagcgaggggtcggc gcccccctcctcgagggctgcggcggcggccgcgaccgcgccgcagccggtgtgccccag caccacgatgagcggcacgccgagcaccgagacgccgtaggtgaggctcgccaaaatcgc gtcgtcgacgatgttgccggcgacccggttggtaaagaggtcgcccaagccctggtcgaa gatgatctgcggcggcacgcgcgagtcggcgcagccgagcaccgccgcgaaggggtgctg ctcgcgcgtcaaggacgcgcgccagcgcgcgtcttggtgcgggtgcgccatgcgcccctc gaggtagcgctggtgcccggccaacaggcgctcgagcgccccttggggggtcaggggcgt ctggggggcggtcat >Salinispora arenicola Carbonic Anhydrase 1 of 741 bases. atgaactgcccaggaacgcccgacacacagccgggctcgcacccggtgtcctccagtgga atcggcggttcccggagcgggccggtcgggcccgagcaggcgcttgccgagttgtacgac ggcaaccggcgattcgccgttggtgttccgatccgcccacaccaggacatcgaccgtcgg gtcgccctggcggatggtcagcagcccttcgcggtgatcgtcggctgttccgactcccga cttgctgctgagatcatctttgaccgtggtctcggtgacctgttcgtggtacgcaccgct gggcacacggtcgggccagaggtgctgggcagcgtcgagtacgcggtcaccgtgctgggt gcgccgctggtggtggtgctcggccacgactcctgtggagcggtacaggcggcccggacc gccgacgccaccggcgcaccggcgtccgggcacctccgcgctgtggtggacggggtggtg ccgagcgtgcgtcgggccggggcccgtggggttaccgagatcgaccagatcgtcgacatt catatcgagcagaccgttgaggcggtgcttggccgttctgaggcggtcgcagccgcggtg gccggcggacggtgtgcggtggtgggaatgtcgtaccggctcaccgcaggtgaggtgcac acggttaccgcggttggcctcgcggcgccgaccacaccaccggccgcgcctgagacccgc cccagcgccggaccggcgtaa >abc Salinispora arenicola Carbonic Anhydrase 2 of 748 bases naancacancanatgaatcgggccgtgtggccgagtcggagagcactgctttccggtgg gctggtgggcgctgtggccgtgcttgcggggtgttcgtcgacgaaggcgcgctcgtcggc
  • 18. gaccagcgcatcaccgacagcgtcgcccaccacgcccaccgccgcattcgagaggctgat ggagggcaaccagcggtgggtgcgtggagaccttcaacaacccaaccgggatccagctcg gcgtcaagtcgtggcccacgaacagaagccctttggggcggtcctcgcatgcattgactc gcgggtgccgcctgaactcctcttcgacaccggcctgggtgatcttttcgtgacacgtac gggaggtgaggcgatcggcccagtggtcactggttctgtcgagtttggacctctgaccag tggcactccgctcatcgtggtccttgggcatcagcgttgcggcgccgtcaaggcggcgta cacctcccttcgtgagggcaagccgctgcccggcaacctaccggcgatcgttacggccct ccagccggcgtatgaacaggtagcctcagcggggagcgccgacccgatcgacgccatggc ccgagcccaggccgagctgatcgcaaacgacctgcgctccaacccggaactagccccact cgtggcgaagcgggaccttgccgtggtcagtgcctactattccctcgataccggcgcggt ggaagtcctcagtggcagaccctcctga >Frankia CcI Carbonic Anhydrase 1 of 488 bases tgtccgtcaccgacgactacctgaccaacaacgccgcctacgcgaagaccttcgccgggc cgcttccgctgccgccgtccaagcacatcgccgccgtcgcctgcatggacgcacggctca acgtctacgcgatccttggcctgggcgacggcgaggctcacgtcatccgcaacgccggcg gcgtcgtcaccgacgacgagatccgttccctcgcgatcagccagcgcctgctcggcaccc gcgagatcatcctgatccaccacaccgactgcggcatgctgaccttcaccgacgacgatt ttaaacgctcgatccaggacgagaccgggatcaaaccagaatgggccgtggagtcgttta ccgacctggccgaagacatacgccagtcgattgcgcggatcaaggcgagcccgttcatcc cgcataccgacgccatccgcggcttcatcttcgatgttgccaccggactgctcaccgaag tcgcgtga >xyz Frankia CcI3 Carbonic Anhydrase 2 of 618 bases gtggacaccgatgaccacaccgctgtcgaccccgttgccgatgtccatgcagacgatgtc catgcggacaccgtgcgccccgcggatacggtgagcccggtgagcggcgctgccacggcg accgaactcctgctgagctacgctgcaggtcaccccgcccggcggcgggaggccgggcta cctgccctgcccggcgcgcggccgcgcctgggcgtcgcggtggttgcgtgtatggacgtg cggatccaggtggaggccttgctcggtcttgtcgaaggtgacgcccacatcctgcgcaac gccggtggtgtcatcaccccggatgtggtccgctcgctcgccgtgagccagcacgtgctg ggaacgacggagatcattcttttgcatcacaccgggtgtggtctcgaaaggatcaccgac gacgggttccgggaccagttggagtgcaagacgggcgttcgtcccgaatgggccgtgtat tcctttcccgatgtcgaggaggacgtgcgcaagtccgtcagggtgctgcgttcgtcgccg ttcctgcagtccaccacctcggtacgcgggttcgtctaccaggtggagaccggggcactg gtcgaggttctgccgtag We will now proceed to compare the translation product of the ORF of the gene with the original protein product. Methanococcus produces the protein in reading frame 1 of the reverse strand of the DNA segment. It does not start with ATG.first amino acid is L inplace of M.Staphylococcus and Vibrio does the same thing in frame 1 of forward direction. The same is observed in Frankia and Salinispora.
  • 19. The gene product is typically labeled ‘orf’. 1) 1) Methanococcus Voltae A3- 2)Staphylococcus Carnosus- 3)Vibrio cholera-
  • 20. 3)The comparison of E.coli gene-pro and protein are as follows- For the rest, we will be comparing only 1 suspected protein and gene product for consistency.For Truepera-
  • 21. 5) For Salinispora- 6) For Frankia-
  • 22. Codon Analysis is as follows- Results for 411 residue sequence "Methanococcus voltae Carbonic Anhydrase of 471 bases AmAcid Codon Number /1000 Fraction .. Ala GCG 0.00 0.00 0.00 Ala GCA 0.00 0.00 0.00 Ala GCT 0.00 0.00 0.00 Ala GCC 1.00 7.30 1.00 Cys TGT 2.00 14.60 0.20 Cys TGC 8.00 58.39 0.80 Asp GAT 4.00 29.20 0.67 Asp GAC 2.00 14.60 0.33 Glu GAG 1.00 7.30 0.50 Glu GAA 1.00 7.30 0.50 Phe TTT 12.00 87.59 0.50 Phe TTC 12.00 87.59 0.50 Gly GGG 0.00 0.00 0.00 Gly GGA 0.00 0.00 0.00 Gly GGT 1.00 7.30 0.50 Gly GGC 1.00 7.30 0.50 His CAT 6.00 43.80 0.86 His CAC 1.00 7.30 0.14 Ile ATA 0.00 0.00 0.00 Ile ATT 4.00 29.20 0.40 Ile ATC 6.00 43.80 0.60 Lys AAG 0.00 0.00 0.00 Lys AAA 2.00 14.60 1.00 Leu TTG 0.00 0.00 0.00 Leu TTA 0.00 0.00 0.00 Leu CTG 0.00 0.00 0.00 Leu CTA 0.00 0.00 0.00 Leu CTT 2.00 14.60 0.67 Leu CTC 1.00 7.30 0.33 Met ATG 1.00 7.30 1.00
  • 23. Asn AAT 5.00 36.50 0.71 Asn AAC 2.00 14.60 0.29 Pro CCG 0.00 0.00 0.00 Pro CCA 1.00 7.30 0.50 Pro CCT 1.00 7.30 0.50 Pro CCC 0.00 0.00 0.00 Gln CAG 0.00 0.00 0.00 Gln CAA 2.00 14.60 1.00 Arg AGG 3.00 21.90 0.50 Arg AGA 0.00 0.00 0.00 Arg CGG 1.00 7.30 0.17 Arg CGA 1.00 7.30 0.17 Arg CGT 1.00 7.30 0.17 Arg CGC 0.00 0.00 0.00 Ser AGT 2.00 14.60 0.17 Ser AGC 2.00 14.60 0.17 Ser TCG 0.00 0.00 0.00 Ser TCA 0.00 0.00 0.00 Ser TCT 4.00 29.20 0.33 Ser TCC 4.00 29.20 0.33 Thr ACG 0.00 0.00 0.00 Thr ACA 3.00 21.90 0.30 Thr ACT 1.00 7.30 0.10 Thr ACC 6.00 43.80 0.60 Val GTG 1.00 7.30 0.10 Val GTA 2.00 14.60 0.20 Val GTT 3.00 21.90 0.30 Val GTC 4.00 29.20 0.40 Trp TGG 2.00 14.60 1.00 Tyr TAT 5.00 36.50 0.45 Tyr TAC 6.00 43.80 0.55 End TGA 0.00 0.00 0.00 End TAG 0.00 0.00 0.00 End TAA 7.00 51.09 1.00
  • 24. Results for 594 residue sequence "Staphylococcus carnosus” Carbonic Anhydrase of 594 bases" AmAcid Codon Number /1000 Fraction ... Ala GCG 0.00 0.00 0.00 Ala GCA 7.00 35.90 0.58 Ala GCT 5.00 25.64 0.42 Ala GCC 0.00 0.00 0.00 Cys TGT 2.00 10.26 0.67 Cys TGC 1.00 5.13 0.33 Asp GAT 12.00 61.54 0.75 Asp GAC 4.00 20.51 0.25 Glu GAG 0.00 0.00 0.00 Glu GAA 13.00 66.67 1.00 Phe TTT 7.00 35.90 0.88 Phe TTC 1.00 5.13 0.13 Gly GGG 1.00 5.13 0.06 Gly GGA 1.00 5.13 0.06 Gly GGT 10.00 51.28 0.63 Gly GGC 4.00 20.51 0.25 His CAT 4.00 20.51 0.67 His CAC 2.00 10.26 0.33 Ile ATA 1.00 5.13 0.07 Ile ATT 8.00 41.03 0.57 Ile ATC 5.00 25.64 0.36 Lys AAG 0.00 0.00 0.00 Lys AAA 17.00 87.18 1.00 Leu TTG 2.00 10.26 0.11 Leu TTA 13.00 66.67 0.72 Leu CTG 0.00 0.00 0.00 Leu CTA 1.00 5.13 0.06 Leu CTT 1.00 5.13 0.06 Leu CTC 1.00 5.13 0.06 Met ATG 6.00 30.77 1.00 Asn AAT 8.00 41.03 0.80 Asn AAC 2.00 10.26 0.20
  • 25. Pro CCG 0.00 0.00 0.00 Pro CCA 2.00 10.26 0.33 Pro CCT 2.00 10.26 0.33 Pro CCC 2.00 10.26 0.33 Gln CAG 0.00 0.00 0.00 Gln CAA 4.00 20.51 1.00 Arg AGG 0.00 0.00 0.00 Arg AGA 1.00 5.13 0.25 Arg CGG 0.00 0.00 0.00 Arg CGA 0.00 0.00 0.00 Arg CGT 2.00 10.26 0.50 Arg CGC 1.00 5.13 0.25 Ser AGT 1.00 5.13 0.13 Ser AGC 4.00 20.51 0.50 Ser TCG 0.00 0.00 0.00 Ser TCA 1.00 5.13 0.13 Ser TCT 1.00 5.13 0.13 Ser TCC 1.00 5.13 0.13 Thr ACG 3.00 15.38 0.25 Thr ACA 7.00 35.90 0.58 Thr ACT 2.00 10.26 0.17 Thr ACC 0.00 0.00 0.00 Val GTG 1.00 5.13 0.07 Val GTA 6.00 30.77 0.43 Val GTT 3.00 15.38 0.21 Val GTC 4.00 20.51 0.29 Trp TGG 0.00 0.00 0.00 Tyr TAT 6.00 30.77 0.86 Tyr TAC 1.00 5.13 0.14 End TGA 0.00 0.00 0.00 End TAG 0.00 0.00 0.00 End TAA 1.00 5.13 1.00
  • 26. Results for 660 residue sequence "Vibrio cholerae Carbonic Anhydrase of 720 bases AmAcid Codon Number /1000 Fraction .. Ala GCG 7.00 31.82 0.44 Ala GCA 2.00 9.09 0.13 Ala GCT 3.00 13.64 0.19 Ala GCC 4.00 18.18 0.25 Cys TGT 0.00 0.00 0.00 Cys TGC 2.00 9.09 1.00 Asp GAT 5.00 22.73 0.71 Asp GAC 2.00 9.09 0.29 Glu GAG 7.00 31.82 0.50 Glu GAA 7.00 31.82 0.50 Phe TTT 4.00 18.18 0.57 Phe TTC 3.00 13.64 0.43 Gly GGG 7.00 31.82 0.41 Gly GGA 3.00 13.64 0.18 Gly GGT 4.00 18.18 0.24 Gly GGC 3.00 13.64 0.18 His CAT 8.00 36.36 0.73 His CAC 3.00 13.64 0.27 Ile ATA 1.00 4.55 0.17 Ile ATT 2.00 9.09 0.33 Ile ATC 3.00 13.64 0.50 Lys AAG 3.00 13.64 0.30 Lys AAA 7.00 31.82 0.70 Leu TTG 5.00 22.73 0.22 Leu TTA 2.00 9.09 0.09 Leu CTG 5.00 22.73 0.22 Leu CTA 2.00 9.09 0.09 Leu CTT 5.00 22.73 0.22 Leu CTC 4.00 18.18 0.17 Met ATG 3.00 13.64 1.00 Asn AAT 13.00 59.09 0.87 Asn AAC 2.00 9.09 0.13
  • 27. Pro CCG 6.00 27.27 0.38 Pro CCA 4.00 18.18 0.25 Pro CCT 5.00 22.73 0.31 Pro CCC 1.00 4.55 0.06 Gln CAG 7.00 31.82 0.35 Gln CAA 13.00 59.09 0.65 Arg AGG 0.00 0.00 0.00 Arg AGA 0.00 0.00 0.00 Arg CGG 0.00 0.00 0.00 Arg CGA 1.00 4.55 0.20 Arg CGT 4.00 18.18 0.80 Arg CGC 0.00 0.00 0.00 Ser AGT 2.00 9.09 0.18 Ser AGC 2.00 9.09 0.18 Ser TCG 4.00 18.18 0.36 Ser TCA 1.00 4.55 0.09 Ser TCT 1.00 4.55 0.09 Ser TCC 1.00 4.55 0.09 Thr ACG 5.00 22.73 0.50 Thr ACA 0.00 0.00 0.00 Thr ACT 3.00 13.64 0.30 Thr ACC 2.00 9.09 0.20 Val GTG 5.00 22.73 0.29 Val GTA 3.00 13.64 0.18 Val GTT 6.00 27.27 0.35 Val GTC 3.00 13.64 0.18 Trp TGG 4.00 18.18 1.00 Tyr TAT 3.00 13.64 0.60 Tyr TAC 2.00 9.09 0.40 End TGA 0.00 0.00 0.00 End TAG 0.00 0.00 0.00 End TAA 1.00 4.55 1.00
  • 28. Results for 372 residue sequence "Eschereshia coli Carbonic Anhydrase of 372 bases" AmAcid Codon Number /1000 Fraction .. Ala GCG 4.00 32.26 0.31 Ala GCA 1.00 8.06 0.08 Ala GCT 1.00 8.06 0.08 Ala GCC 7.00 56.45 0.54 Cys TGT 1.00 8.06 0.50 Cys TGC 1.00 8.06 0.50 Asp GAT 4.00 32.26 0.57 Asp GAC 3.00 24.19 0.43 Glu GAG 2.00 16.13 0.67 Glu GAA 1.00 8.06 0.33 Phe TTT 5.00 40.32 0.45 Phe TTC 6.00 48.39 0.55 Gly GGG 0.00 0.00 0.00 Gly GGA 2.00 16.13 0.25 Gly GGT 3.00 24.19 0.38 Gly GGC 3.00 24.19 0.38 His CAT 3.00 24.19 0.75 His CAC 1.00 8.06 0.25 Ile ATA 1.00 8.06 0.20 Ile ATT 0.00 0.00 0.00 Ile ATC 4.00 32.26 0.80 Lys AAG 2.00 16.13 1.00 Lys AAA 0.00 0.00 0.00 Leu TTG 4.00 32.26 0.40 Leu TTA 1.00 8.06 0.10 Leu CTG 2.00 16.13 0.20 Leu CTA 0.00 0.00 0.00 Leu CTT 1.00 8.06 0.10 Leu CTC 2.00 16.13 0.20 Met ATG 2.00 16.13 1.00
  • 29. Asn AAT 3.00 24.19 0.75 Asn AAC 1.00 8.06 0.25 Pro CCG 0.00 0.00 0.00 Pro CCA 4.00 32.26 0.57 Pro CCT 0.00 0.00 0.00 Pro CCC 3.00 24.19 0.43 Gln CAG 9.00 72.58 0.75 Gln CAA 3.00 24.19 0.25 Arg AGG 0.00 0.00 0.00 Arg AGA 0.00 0.00 0.00 Arg CGG 2.00 16.13 0.50 Arg CGA 0.00 0.00 0.00 Arg CGT 0.00 0.00 0.00 Arg CGC 2.00 16.13 0.50 Ser AGT 2.00 16.13 1.00 Ser AGC 0.00 0.00 0.00 Ser TCG 0.00 0.00 0.00 Ser TCA 0.00 0.00 0.00 Ser TCT 0.00 0.00 0.00 Ser TCC 0.00 0.00 0.00 Thr ACG 4.00 32.26 0.67 Thr ACA 1.00 8.06 0.17 Thr ACT 0.00 0.00 0.00 Thr ACC 1.00 8.06 0.17 Val GTG 7.00 56.45 0.37 Val GTA 3.00 24.19 0.16 Val GTT 8.00 64.52 0.42 Val GTC 1.00 8.06 0.05 Trp TGG 0.00 0.00 0.00 Tyr TAT 0.00 0.00 0.00 Tyr TAC 1.00 8.06 1.00 End TGA 2.00 16.13 1.00 End TAG 0.00 0.00 0.00 End TAA 0.00 0.00 0.00
  • 30. Results for 660 residue sequence "456 Ecoli Carbonic anhydrase Final 2" AmAcid Codon Number /1000 Fraction .. Ala GCG 9.00 40.91 0.30 Ala GCA 4.00 18.18 0.13 Ala GCT 7.00 31.82 0.23 Ala GCC 10.00 45.45 0.33 Cys TGT 4.00 18.18 0.67 Cys TGC 2.00 9.09 0.33 Asp GAT 4.00 18.18 0.44 Asp GAC 5.00 22.73 0.56 Glu GAG 7.00 31.82 0.58 Glu GAA 5.00 22.73 0.42 Phe TTT 5.00 22.73 0.63 Phe TTC 3.00 13.64 0.38 Gly GGG 2.00 9.09 0.17 Gly GGA 1.00 4.55 0.08 Gly GGT 2.00 9.09 0.17 Gly GGC 7.00 31.82 0.58 His CAT 4.00 18.18 0.67 His CAC 2.00 9.09 0.33 Ile ATA 1.00 4.55 0.08 Ile ATT 8.00 36.36 0.62 Ile ATC 4.00 18.18 0.31 Lys AAG 1.00 4.55 0.20 Lys AAA 4.00 18.18 0.80 Leu TTG 3.00 13.64 0.18 Leu TTA 1.00 4.55 0.06 Leu CTG 8.00 36.36 0.47 Leu CTA 1.00 4.55 0.06 Leu CTT 3.00 13.64 0.18 Leu CTC 1.00 4.55 0.06 Met ATG 4.00 18.18 1.00
  • 31. Asn AAT 4.00 18.18 0.57 Asn AAC 3.00 13.64 0.43 Pro CCG 7.00 31.82 0.47 Pro CCA 2.00 9.09 0.13 Pro CCT 5.00 22.73 0.33 Pro CCC 1.00 4.55 0.07 Gln CAG 6.00 27.27 0.60 Gln CAA 4.00 18.18 0.40 Arg AGG 0.00 0.00 0.00 Arg AGA 0.00 0.00 0.00 Arg CGG 3.00 13.64 0.19 Arg CGA 0.00 0.00 0.00 Arg CGT 4.00 18.18 0.25 Arg CGC 9.00 40.91 0.56 Ser AGT 0.00 0.00 0.00 Ser AGC 5.00 22.73 0.29 Ser TCG 2.00 9.09 0.12 Ser TCA 2.00 9.09 0.12 Ser TCT 2.00 9.09 0.12 Ser TCC 6.00 27.27 0.35 Thr ACG 1.00 4.55 0.14 Thr ACA 2.00 9.09 0.29 Thr ACT 1.00 4.55 0.14 Thr ACC 3.00 13.64 0.43 Val GTG 6.00 27.27 0.32 Val GTA 2.00 9.09 0.11 Val GTT 4.00 18.18 0.21 Val GTC 7.00 31.82 0.37 Trp TGG 2.00 9.09 1.00 Tyr TAT 2.00 9.09 0.50 Tyr TAC 2.00 9.09 0.50 End TGA 0.00 0.00 0.00 End TAG 0.00 0.00 0.00 End TAA 1.00 4.55 1.00
  • 32. Results for 663 residue sequence "123 Ecoli carbonic Anhydrase Final" AmAcid Codon Number /1000 Fraction .. Ala GCG 6.00 27.15 0.32 Ala GCA 2.00 9.05 0.11 Ala GCT 2.00 9.05 0.11 Ala GCC 9.00 40.72 0.47 Cys TGT 1.00 4.52 0.17 Cys TGC 5.00 22.62 0.83 Asp GAT 5.00 22.62 0.71 Asp GAC 2.00 9.05 0.29 Glu GAG 5.00 22.62 0.83 Glu GAA 1.00 4.52 0.17 Phe TTT 9.00 40.72 0.45 Phe TTC 11.00 49.77 0.55 Gly GGG 1.00 4.52 0.06 Gly GGA 4.00 18.10 0.25 Gly GGT 7.00 31.67 0.44 Gly GGC 4.00 18.10 0.25 His CAT 5.00 22.62 0.56 His CAC 4.00 18.10 0.44 Ile ATA 2.00 9.05 0.18 Ile ATT 2.00 9.05 0.18 Ile ATC 7.00 31.67 0.64 Lys AAG 4.00 18.10 0.50 Lys AAA 4.00 18.10 0.50 Leu TTG 6.00 27.15 0.38 Leu TTA 1.00 4.52 0.06 Leu CTG 3.00 13.57 0.19 Leu CTA 0.00 0.00 0.00 Leu CTT 1.00 4.52 0.06 Leu CTC 5.00 22.62 0.31 Met ATG 2.00 9.05 1.00
  • 33. Asn AAT 8.00 36.20 0.50 Asn AAC 8.00 36.20 0.50 Pro CCG 0.00 0.00 0.00 Pro CCA 6.00 27.15 0.60 Pro CCT 0.00 0.00 0.00 Pro CCC 4.00 18.10 0.40 Gln CAG 13.00 58.82 0.81 Gln CAA 3.00 13.57 0.19 Arg AGG 1.00 4.52 0.14 Arg AGA 0.00 0.00 0.00 Arg CGG 4.00 18.10 0.57 Arg CGA 0.00 0.00 0.00 Arg CGT 0.00 0.00 0.00 Arg CGC 2.00 9.05 0.29 Ser AGT 1.00 4.52 0.33 Ser AGC 1.00 4.52 0.33 Ser TCG 0.00 0.00 0.00 Ser TCA 0.00 0.00 0.00 Ser TCT 0.00 0.00 0.00 Ser TCC 1.00 4.52 0.33 Thr ACG 6.00 27.15 0.50 Thr ACA 3.00 13.57 0.25 Thr ACT 1.00 4.52 0.08 Thr ACC 2.00 9.05 0.17 Val GTG 10.00 45.25 0.36 Val GTA 3.00 13.57 0.11 Val GTT 11.00 49.77 0.39 Val GTC 4.00 18.10 0.14 Trp TGG 0.00 0.00 0.00 Tyr TAT 1.00 4.52 0.33 Tyr TAC 2.00 9.05 0.67 End TGA 3.00 13.57 0.50 End TAG 2.00 9.05 0.33 End TAA 1.00 4.52 0.17
  • 34. Results for 675 residue sequence "Truepera radiovictrix DSM1703 Carbo Anhyd consisting of 675 bases" AmAcid Codon Number /1000 Fraction .. Ala GCG 12.00 53.33 0.41 Ala GCA 3.00 13.33 0.10 Ala GCT 2.00 8.89 0.07 Ala GCC 12.00 53.33 0.41 Cys TGT 1.00 4.44 0.50 Cys TGC 1.00 4.44 0.50 Asp GAT 6.00 26.67 0.46 Asp GAC 7.00 31.11 0.54 Glu GAG 15.00 66.67 0.88 Glu GAA 2.00 8.89 0.12 Phe TTT 0.00 0.00 0.00 Phe TTC 1.00 4.44 1.00 Gly GGG 11.00 48.89 0.33 Gly GGA 2.00 8.89 0.06 Gly GGT 5.00 22.22 0.15 Gly GGC 15.00 66.67 0.45 His CAT 2.00 8.89 0.18 His CAC 9.00 40.00 0.82 Ile ATA 0.00 0.00 0.00 Ile ATT 0.00 0.00 0.00 Ile ATC 0.00 0.00 0.00 Lys AAG 2.00 8.89 0.67 Lys AAA 1.00 4.44 0.33 Leu TTG 2.00 8.89 0.10 Leu TTA 0.00 0.00 0.00 Leu CTG 6.00 26.67 0.29 Leu CTA 0.00 0.00 0.00 Leu CTT 2.00 8.89 0.10 Leu CTC 11.00 48.89 0.52 Met ATG 0.00 0.00 0.00 Asn AAT 1.00 4.44 0.50
  • 35. Asn AAC 1.00 4.44 0.50 Pro CCG 4.00 17.78 0.25 Pro CCA 1.00 4.44 0.06 Pro CCT 1.00 4.44 0.06 Pro CCC 10.00 44.44 0.63 Gln CAG 6.00 26.67 0.50 Gln CAA 6.00 26.67 0.50 Arg AGG 0.00 0.00 0.00 Arg AGA 0.00 0.00 0.00 Arg CGG 7.00 31.11 0.21 Arg CGA 3.00 13.33 0.09 Arg CGT 8.00 35.56 0.24 Arg CGC 15.00 66.67 0.45 Ser AGT 0.00 0.00 0.00 Ser AGC 4.00 17.78 0.80 Ser TCG 0.00 0.00 0.00 Ser TCA 1.00 4.44 0.20 Ser TCT 0.00 0.00 0.00 Ser TCC 0.00 0.00 0.00 Thr ACG 1.00 4.44 0.50 Thr ACA 1.00 4.44 0.50 Thr ACT 0.00 0.00 0.00 Thr ACC 0.00 0.00 0.00 Val GTG 7.00 31.11 0.32 Val GTA 3.00 13.33 0.14 Val GTT 3.00 13.33 0.14 Val GTC 9.00 40.00 0.41 Trp TGG 0.00 0.00 0.00 Tyr TAT 0.00 0.00 0.00 Tyr TAC 0.00 0.00 0.00 End TGA 0.00 0.00 0.00 End TAG 1.00 4.44 0.33 End TAA 2.00 8.89 0.67
  • 36. Results for 765 residue sequence " Salinispora arenicola " AmAcid Codon Number /1000 Fraction .. Ala GCG 17.00 68.55 0.47 Ala GCA 3.00 12.10 0.08 Ala GCT 4.00 16.13 0.11 Ala GCC 12.00 48.39 0.33 Cys TGT 3.00 12.10 0.75 Cys TGC 1.00 4.03 0.25 Asp GAT 1.00 4.03 0.08 Asp GAC 12.00 48.39 0.92 Glu GAG 11.00 44.35 1.00 Glu GAA 0.00 0.00 0.00 Phe TTT 1.00 4.03 0.25 Phe TTC 3.00 12.10 0.75 Gly GGG 8.00 32.26 0.26 Gly GGA 6.00 24.19 0.19 Gly GGT 7.00 28.23 0.23 Gly GGC 10.00 40.32 0.32 His CAT 1.00 4.03 0.13 His CAC 7.00 28.23 0.88 Ile ATA 0.00 0.00 0.00 Ile ATT 1.00 4.03 0.10 Ile ATC 9.00 36.29 0.90 Lys AAG 0.00 0.00 0.00 Lys AAA 0.00 0.00 0.00 Leu TTG 1.00 4.03 0.07 Leu TTA 0.00 0.00 0.00 Leu CTG 5.00 20.16 0.36 Leu CTA 0.00 0.00 0.00 Leu CTT 3.00 12.10 0.21 Leu CTC 5.00 20.16 0.36 Met ATG 2.00 8.06 1.00 Asn AAT 0.00 0.00 0.00 Asn AAC 2.00 8.06 1.00
  • 37. Pro CCG 10.00 40.32 0.53 Pro CCA 4.00 16.13 0.21 Pro CCT 1.00 4.03 0.05 Pro CCC 4.00 16.13 0.21 Gln CAG 8.00 32.26 1.00 Gln CAA 0.00 0.00 0.00 Arg AGG 0.00 0.00 0.00 Arg AGA 0.00 0.00 0.00 Arg CGG 7.00 28.23 0.39 Arg CGA 2.00 8.06 0.11 Arg CGT 5.00 20.16 0.28 Arg CGC 4.00 16.13 0.22 Ser AGT 1.00 4.03 0.07 Ser AGC 4.00 16.13 0.27 Ser TCG 2.00 8.06 0.13 Ser TCA 0.00 0.00 0.00 Ser TCT 1.00 4.03 0.07 Ser TCC 7.00 28.23 0.47 Thr ACG 3.00 12.10 0.20 Thr ACA 2.00 8.06 0.13 Thr ACT 0.00 0.00 0.00 Thr ACC 10.00 40.32 0.67 Val GTG 18.00 72.58 0.53 Val GTA 2.00 8.06 0.06 Val GTT 6.00 24.19 0.18 Val GTC 8.00 32.26 0.24 Trp TGG 0.00 0.00 0.00 Tyr TAT 0.00 0.00 0.00 Tyr TAC 3.00 12.10 1.00 End TGA 0.00 0.00 0.00 End TAG 0.00 0.00 0.00 End TAA 1.00 4.03 1.00
  • 38. Results for 774 residue sequence " abc Salinispora " starting AmAcid Codon Number /1000 Fraction .. Ala GCG 13.00 52.85 0.37 Ala GCA 5.00 20.33 0.14 Ala GCT 2.00 8.13 0.06 Ala GCC 15.00 60.98 0.43 Cys TGT 1.00 4.07 0.33 Cys TGC 2.00 8.13 0.67 Asp GAT 3.00 12.20 0.30 Asp GAC 7.00 28.46 0.70 Glu GAG 6.00 24.39 0.55 Glu GAA 5.00 20.33 0.45 Phe TTT 2.00 8.13 0.40 Phe TTC 3.00 12.20 0.60 Gly GGG 5.00 20.33 0.23 Gly GGA 3.00 12.20 0.14 Gly GGT 4.00 16.26 0.18 Gly GGC 10.00 40.65 0.45 His CAT 1.00 4.07 0.33 His CAC 2.00 8.13 0.67 Ile ATA 0.00 0.00 0.00 Ile ATT 1.00 4.07 0.17 Ile ATC 5.00 20.33 0.83 Lys AAG 5.00 20.33 1.00 Lys AAA 0.00 0.00 0.00 Leu TTG 0.00 0.00 0.00 Leu TTA 0.00 0.00 0.00 Leu CTG 8.00 32.52 0.32 Leu CTA 2.00 8.13 0.08 Leu CTT 7.00 28.46 0.28 Leu CTC 8.00 32.52 0.32 Met ATG 3.00 12.20 1.00
  • 39. Asn AAT 1.00 4.07 0.17 Asn AAC 5.00 20.33 0.83 Pro CCG 9.00 36.59 0.45 Pro CCA 3.00 12.20 0.15 Pro CCT 2.00 8.13 0.10 Pro CCC 6.00 24.39 0.30 Gln CAG 6.00 24.39 0.67 Gln CAA 3.00 12.20 0.33 Arg AGG 1.00 4.07 0.06 Arg AGA 2.00 8.13 0.11 Arg CGG 7.00 28.46 0.39 Arg CGA 1.00 4.07 0.06 Arg CGT 5.00 20.33 0.28 Arg CGC 2.00 8.13 0.11 Ser AGT 4.00 16.26 0.20 Ser AGC 2.00 8.13 0.10 Ser TCG 6.00 24.39 0.30 Ser TCA 2.00 8.13 0.10 Ser TCT 1.00 4.07 0.05 Ser TCC 5.00 20.33 0.25 Thr ACG 4.00 16.26 0.27 Thr ACA 2.00 8.13 0.13 Thr ACT 2.00 8.13 0.13 Thr ACC 7.00 28.46 0.47 Val GTG 13.00 52.85 0.57 Val GTA 1.00 4.07 0.04 Val GTT 1.00 4.07 0.04 Val GTC 8.00 32.52 0.35 Trp TGG 2.00 8.13 1.00 Tyr TAT 2.00 8.13 0.50 Tyr TAC 2.00 8.13 0.50 End TGA 1.00 4.07 1.00 End TAG 0.00 0.00 0.00 End TAA 0.00 0.00 0.00
  • 40. Results for 488 residue sequence "Frankia CcI Carbonic Anhydrase 1 of 488 bases" AmAcid Codon Number /1000 Fraction .. Ala GCG 7.00 43.21 0.39 Ala GCA 4.00 24.69 0.22 Ala GCT 2.00 12.35 0.11 Ala GCC 5.00 30.86 0.28 Cys TGT 1.00 6.17 0.20 Cys TGC 4.00 24.69 0.80 Asp GAT 0.00 0.00 0.00 Asp GAC 1.00 6.17 1.00 Glu GAG 0.00 0.00 0.00 Glu GAA 0.00 0.00 0.00 Phe TTT 0.00 0.00 0.00 Phe TTC 1.00 6.17 1.00 Gly GGG 1.00 6.17 0.20 Gly GGA 2.00 12.35 0.40 Gly GGT 0.00 0.00 0.00 Gly GGC 2.00 12.35 0.40 His CAT 0.00 0.00 0.00 His CAC 1.00 6.17 1.00 Ile ATA 1.00 6.17 0.50 Ile ATT 1.00 6.17 0.50 Ile ATC 0.00 0.00 0.00 Lys AAG 2.00 12.35 1.00 Lys AAA 0.00 0.00 0.00 Leu TTG 3.00 18.52 0.50 Leu TTA 2.00 12.35 0.33 Leu CTG 0.00 0.00 0.00 Leu CTA 0.00 0.00 0.00 Leu CTT 0.00 0.00 0.00 Leu CTC 1.00 6.17 0.17 Met ATG 1.00 6.17 1.00
  • 41. Asn AAT 1.00 6.17 0.33 Asn AAC 2.00 12.35 0.67 Pro CCG 17.00 104.94 0.63 Pro CCA 4.00 24.69 0.15 Pro CCT 4.00 24.69 0.15 Pro CCC 2.00 12.35 0.07 Gln CAG 1.00 6.17 1.00 Gln CAA 0.00 0.00 0.00 Arg AGG 3.00 18.52 0.14 Arg AGA 4.00 24.69 0.18 Arg CGG 0.00 0.00 0.00 Arg CGA 6.00 37.04 0.27 Arg CGT 4.00 24.69 0.18 Arg CGC 5.00 30.86 0.23 Ser AGT 2.00 12.35 0.06 Ser AGC 2.00 12.35 0.06 Ser TCG 8.00 49.38 0.24 Ser TCA 12.00 74.07 0.35 Ser TCT 2.00 12.35 0.06 Ser TCC 8.00 49.38 0.24 Thr ACG 15.00 92.59 0.63 Thr ACA 4.00 24.69 0.17 Thr ACT 2.00 12.35 0.08 Thr ACC 3.00 18.52 0.13 Val GTG 0.00 0.00 0.00 Val GTA 0.00 0.00 0.00 Val GTT 1.00 6.17 1.00 Val GTC 0.00 0.00 0.00 Trp TGG 4.00 24.69 1.00 Tyr TAT 0.00 0.00 0.00 Tyr TAC 1.00 6.17 1.00 End TGA 3.00 18.52 1.00 End TAG 0.00 0.00 0.00 End TAA 0.00 0.00 0.00
  • 42. Results for 618 residue sequence "xyz Frankia CcI3 Carbonic Anhydrase 2 of 618 bases" 67 AmAcid Codon Number /1000 Fraction .. Ala GCG 6.00 29.13 0.27 Ala GCA 3.00 14.56 0.14 Ala GCT 3.00 14.56 0.14 Ala GCC 10.00 48.54 0.45 Cys TGT 2.00 9.71 0.67 Cys TGC 1.00 4.85 0.33 Asp GAT 6.00 29.13 0.35 Asp GAC 11.00 53.40 0.65 Glu GAG 8.00 38.83 0.67 Glu GAA 4.00 19.42 0.33 Phe TTT 1.00 4.85 0.25 Phe TTC 3.00 14.56 0.75 Gly GGG 5.00 24.27 0.31 Gly GGA 1.00 4.85 0.06 Gly GGT 6.00 29.13 0.38 Gly GGC 4.00 19.42 0.25 His CAT 3.00 14.56 0.38 His CAC 5.00 24.27 0.63 Ile ATA 0.00 0.00 0.00 Ile ATT 1.00 4.85 0.17 Ile ATC 5.00 24.27 0.83 Lys AAG 2.00 9.71 1.00 Lys AAA 0.00 0.00 0.00 Leu TTG 3.00 14.56 0.15 Leu TTA 0.00 0.00 0.00 Leu CTG 10.00 48.54 0.50 Leu CTA 1.00 4.85 0.05 Leu CTT 2.00 9.71 0.10 Leu CTC 4.00 19.42 0.20 Met ATG 1.00 4.85 1.00
  • 43. Asn AAT 0.00 0.00 0.00 Asn AAC 1.00 4.85 1.00 Pro CCG 5.00 24.27 0.42 Pro CCA 0.00 0.00 0.00 Pro CCT 1.00 4.85 0.08 Pro CCC 6.00 29.13 0.50 Gln CAG 5.00 24.27 1.00 Gln CAA 0.00 0.00 0.00 Arg AGG 2.00 9.71 0.13 Arg AGA 0.00 0.00 0.00 Arg CGG 6.00 29.13 0.38 Arg CGA 0.00 0.00 0.00 Arg CGT 2.00 9.71 0.13 Arg CGC 6.00 29.13 0.38 Ser AGT 0.00 0.00 0.00 Ser AGC 4.00 19.42 0.36 Ser TCG 4.00 19.42 0.36 Ser TCA 0.00 0.00 0.00 Ser TCT 0.00 0.00 0.00 Ser TCC 3.00 14.56 0.27 Thr ACG 5.00 24.27 0.33 Thr ACA 0.00 0.00 0.00 Thr ACT 0.00 0.00 0.00 Thr ACC 10.00 48.54 0.67 Val GTG 14.00 67.96 0.47 Val GTA 1.00 4.85 0.03 Val GTT 4.00 19.42 0.13 Val GTC 11.00 53.40 0.37 Trp TGG 1.00 4.85 1.00 Tyr TAT 1.00 4.85 0.33 Tyr TAC 2.00 9.71 0.67 End TGA 0.00 0.00 0.00 End TAG 1.00 4.85 1.00 End TAA 0.00 0.00 0.00
  • 44. From the above list, we conclude two things- 1) The codon-plot of the different gene o.r.f.s from the same organism are the same except at some minor points. 2) The codon-plot of the organisms only confirm our suspicion while analyzing the peptide sequences that choice of codons is different to suit the G-C content of the organism.
  • 45. Corrections- We undertake this because we noticed that gene products of Methanococcus voltae and Frankia were not starting with amino-acid Methionine. Methanococcus voltae corrections- The mistake seems to be in the database from where sequence has been downloaded. The DNA seq. had ‘ata’ instead of ‘atg’. Frankia sp CcI3 corrections- The mistake seems to have been in the sequence again. The DNA seq. began 27 bp before and the claimed starting site of the protein actually coded for Valine.
  • 46. Conclusion: After studying the three analysis we did with the protein, DNA and the ORF codons,we conclude the following- 1) Bacteria choose codons based on its G-C composition to get same amino acid for creation of protein. G-C rich codon of course gets preference for G-C rich bacteria. Similarly and conversely, A-T rich codon gets preference for G-C poor bacteria. 2) If same amino acid is not there, a synonymous amino acid is used having the same or near about same chemical properties. 3) High G-C content bacteria often employ two different genes for same purpose. The finding of two possible genes in their genome for Carbonic Anhydrase is the proof for such a statement. 4) Most bacteria use Zinc at the metal site yet a small number of bacteria use Cadmium and other metals. 5) Even if they are of varied length, one may look for Serine and Glycine on the peptide chain and see that this region is conserved in all protein,. This is because the protein domains must be similar for all the anhydrases.