Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction
1. Automated Alphabet Reduction
Method with Evolutionary
Algorithms
for Protein Structure Prediction
Jaume Bacardit, Michael Stout, Jonathan D. Hirst,
Kumara Sastry, Xavier Llorà and Natalio Krasnogor
University of Nottingham and University of Illinois at
Urbana-Champaign
3. Protein Structure Prediction (PSP)
The goal is to predict the (complex) 3D structure (and some sub-
features) of a protein from its amino acid sequence (a 1D object)
Primary Sequence 3D Structure
4. Alphabet reduction process and
validation
Domain
Size = N (<20)
(CN, RSA, …)
Test set
Dataset
Dataset Ensemble
ECGA BioHEL
Card=N
Card=20 of rule sets
(<20)
Accuracy
Mutual
Information
5. This entry is human competitive
because:
G: The result solves a problem of indisputable
difficulty in its field (Difficult)
D: The result is publishable in its own right as a
new scientific result - independent of the fact that
the result was mechanically created (Publishable)
E: The result is equal to or better than the most
recent human-created solution to a long-standing
problem for which there has been a succession of
increasingly better human-created solutions
(≥Human)
B: The result is equal to or better than a result that
was accepted as a new scientific result at the time
when it was published in a peer-reviewed scientific
journal (Innovative)
6. G:Difficulty
PSP is, after many decades of research, still one of the main
unsolved problems in Science
In the 2006 CASP experiment, one of the best methods
(Rosetta@home) used > 3 cpu yrs to predict a single protein
Amino acid sequence is a string drawn from a 20-letter
alphabet
Some AAs are similar & could be grouped, reducing the
dimensionality of the domain
We can find a new alphabet with much lower cardinality
than the AA representation without loosing critical
information in the process
We can tailor alphabet reduction automatically to a
variety of PSP-related domains
7. Why is this entry human-
competitive?
The initial version of our alphabet reduction
process has been accepted in GECCO
D:Publish.
2007, in the biological applications track
One of the most famous alphabet
reductions is the HP model that reduces AA
types to only two: Hydrophobic & Polar (e.g.
[Broome & Hecht, 2000])
E:≥Human
Other experts use a broader set of physico-
chemical properties to propose reduced
alphabets (examples in later slides)
We have improved upon both of the above
8. B:Innovative
Comparison of our results against other reduced alphabets existing in
the literature and human-designed ones, applied to two PSP-related
datasets, Coordination Number (CN) and Solvent Accessibility (SA)
Our method produces the best reduced alphabets
Alphabet Letters CN acc. SA acc. Diff. Ref.
AA 20 74.0±0.6 70.7±0.4 --- ---
Our method 5 73.3±0.5 70.3±0.4 0.7/0.4 This work
WW5 6 73.1±0.7 69.6±0.4 0.9/1.1 [Wang & Wang, 99]
Alphabets
SR5 6 73.1±0.7 69.6±0.4 0.9/1.1 [Solis & Rackovsky, 00]
from the
MU4 5 72.6±0.7 69.4±0.4 1.4/1.3 [Murphy et al., 00] literature
MM5 6 73.1±0.6 69.3±0.3 0.9/1.4 [Melo & Marti-Renom, 06]
HD1 7 72.9±0.6 69.3±0.4 1.1/1.4 This work
Expert
HD2 9 73.0±0.6 69.3±0.4 1.0/1.4 This work designed
alphabets
HD3 11 73.2±0.6 69.9±0.4 0.8/0.8 This work
9. Why is this entry better than the
other entries?
PSP is a very difficult and very relevant domain
It has been named as Grand Challenge by the USA
government [1]
Impact of having better protein structure models are
countless
Genetic therapy
Synthesis of drugs for incurable diseases
Improved crops
Environmental remediation
Our contribution is a small but concrete step towards
achieving this goal
[1] Mathematical Committee on Physical, Engineering Engineering Sciences, Federal
Coordinating Council for Science, and Technology. Grand challenges 1993: High
performance computing and communications, 1992.
10. Better than other entries: New
understanding of the folding process
Simpler rules obtained by BioHEL
AA alphabet: If AA−4 ∉ {F, G, I, L, V,X, Y }, AA−3 ∉ {F,
G, Q,W}, AA−2 ∉ {C,N, P}, AA−1 ∉ {A, I, Q, V, Y }, AA
∈ {K}, AA1 ∉ {F, I, L,M,N, P, T, V }, AA2 ∉ {N, P, Q,
S}, AA3 ∉ {C, I, L,R,W}, AA4 ∉ {A,C, I, L,R, S} then AA
is exposed
Reduced alphabet: If AA−4 ∈ {1, 3}, AA−3 ∈ {1, 3}, AA
∈ {3}, AA1 ∈ {1, 3}, AA2 ∉ {1}, AA3 ∉ {0} then AA is
exposed
0 = ACFHILMVWY, 1 = DEKNPQRST (EK for AA), 3
=X
Unexpected explanations: Alphabet reduction
clustered AA types that experts did not expect.
Analyzing the data verified that groups were
sound
11. Better than other entries: run-
time reduction & conclusions
Alphabet reduction is also beneficial in the short
term
We have extrapolated the reduced alphabet to Position-
Specific Scoring Matrices (PSSM)
PSSM is the state-of-the-art representation for PSP with
orders of magnitude more information than the AA alphabet
Learning time of BioHEL using PSSM has been reduced
from 2 weeks to 3 days with only 0.5% accuracy drop
We consider that our entry is the best because it
addresses successfully and in many ways a very
relevant, important, high profile and timely
problem