SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Automated Alphabet Reduction
   Method with Evolutionary
           Algorithms
for Protein Structure Prediction
  Jaume Bacardit, Michael Stout, Jonathan D. Hirst,
  Kumara Sastry, Xavier Llorà and Natalio Krasnogor

  University of Nottingham and University of Illinois at
                  Urbana-Champaign
What is a protein?
Protein Structure Prediction (PSP)
The goal is to predict the (complex) 3D structure (and some sub-
features) of a protein from its amino acid sequence (a 1D object)




      Primary Sequence         3D Structure
Alphabet reduction process and
                validation

  Domain
               Size = N (<20)
(CN, RSA, …)
                                                     Test set




                                Dataset
  Dataset                                          Ensemble
                  ECGA                    BioHEL
                                Card=N
  Card=20                                          of rule sets
                                 (<20)



                                                    Accuracy
                  Mutual
               Information
This entry is human competitive
            because:
 G: The result solves a problem of indisputable
 difficulty in its field (Difficult)
 D: The result is publishable in its own right as a
 new scientific result - independent of the fact that
 the result was mechanically created (Publishable)
 E: The result is equal to or better than the most
 recent human-created solution to a long-standing
 problem for which there has been a succession of
 increasingly better human-created solutions
 (≥Human)
 B: The result is equal to or better than a result that
 was accepted as a new scientific result at the time
 when it was published in a peer-reviewed scientific
 journal (Innovative)
G:Difficulty
PSP is, after many decades of research, still one of the main
unsolved problems in Science
In the 2006 CASP experiment, one of the best methods
(Rosetta@home) used > 3 cpu yrs to predict a single protein
Amino acid sequence is a string drawn from a 20-letter
alphabet
Some AAs are similar & could be grouped, reducing the
dimensionality of the domain
We can find a new alphabet with much lower cardinality
than the AA representation without loosing critical
information in the process
We can tailor alphabet reduction automatically to a
variety of PSP-related domains
Why is this entry human-
                  competitive?
             The initial version of our alphabet reduction
             process has been accepted in GECCO
D:Publish.
             2007, in the biological applications track
             One of the most famous alphabet
             reductions is the HP model that reduces AA
             types to only two: Hydrophobic & Polar (e.g.
             [Broome & Hecht, 2000])
E:≥Human
             Other experts use a broader set of physico-
             chemical properties to propose reduced
             alphabets (examples in later slides)
             We have improved upon both of the above
B:Innovative
    Comparison of our results against other reduced alphabets existing in
    the literature and human-designed ones, applied to two PSP-related
    datasets, Coordination Number (CN) and Solvent Accessibility (SA)
    Our method produces the best reduced alphabets
 Alphabet    Letters   CN acc.   SA acc.   Diff.             Ref.
   AA          20      74.0±0.6 70.7±0.4    ---               ---
Our method     5       73.3±0.5 70.3±0.4 0.7/0.4          This work
  WW5          6       73.1±0.7 69.6±0.4 0.9/1.1     [Wang & Wang, 99]
                                                                              Alphabets
   SR5         6       73.1±0.7 69.6±0.4 0.9/1.1    [Solis & Rackovsky, 00]
                                                                              from the
   MU4         5       72.6±0.7 69.4±0.4 1.4/1.3      [Murphy et al., 00]     literature
   MM5         6       73.1±0.6 69.3±0.3 0.9/1.4   [Melo & Marti-Renom, 06]
   HD1         7       72.9±0.6 69.3±0.4 1.1/1.4          This work
                                                                              Expert
   HD2         9       73.0±0.6 69.3±0.4 1.0/1.4          This work           designed
                                                                              alphabets
   HD3         11      73.2±0.6 69.9±0.4 0.8/0.8          This work
Why is this entry better than the
         other entries?
 PSP is a very difficult and very relevant domain
    It has been named as Grand Challenge by the USA
    government [1]
    Impact of having better protein structure models are
    countless
         Genetic therapy
         Synthesis of drugs for incurable diseases
         Improved crops
         Environmental remediation
    Our contribution is a small but concrete step towards
    achieving this goal

  [1] Mathematical Committee on Physical, Engineering Engineering Sciences, Federal
  Coordinating Council for Science, and Technology. Grand challenges 1993: High
  performance computing and communications, 1992.
Better than other entries: New
understanding of the folding process
Simpler rules obtained by BioHEL
  AA alphabet: If AA−4 ∉ {F, G, I, L, V,X, Y }, AA−3 ∉ {F,
  G, Q,W}, AA−2 ∉ {C,N, P}, AA−1 ∉ {A, I, Q, V, Y }, AA
  ∈ {K}, AA1 ∉ {F, I, L,M,N, P, T, V }, AA2 ∉ {N, P, Q,
  S}, AA3 ∉ {C, I, L,R,W}, AA4 ∉ {A,C, I, L,R, S} then AA
  is exposed
  Reduced alphabet: If AA−4 ∈ {1, 3}, AA−3 ∈ {1, 3}, AA
  ∈ {3}, AA1 ∈ {1, 3}, AA2 ∉ {1}, AA3 ∉ {0} then AA is
  exposed
  0 = ACFHILMVWY, 1 = DEKNPQRST (EK for AA), 3
  =X
Unexpected explanations: Alphabet reduction
clustered AA types that experts did not expect.
Analyzing the data verified that groups were
sound
Better than other entries: run-
time reduction & conclusions
 Alphabet reduction is also beneficial in the short
 term
    We have extrapolated the reduced alphabet to Position-
    Specific Scoring Matrices (PSSM)
    PSSM is the state-of-the-art representation for PSP with
    orders of magnitude more information than the AA alphabet
    Learning time of BioHEL using PSSM has been reduced
    from 2 weeks to 3 days with only 0.5% accuracy drop
 We consider that our entry is the best because it
 addresses successfully and in many ways a very
 relevant, important, high profile and timely
 problem

Más contenido relacionado

Destacado

修心 青山無所爭.福田用心耕(Nx)
修心  青山無所爭.福田用心耕(Nx)修心  青山無所爭.福田用心耕(Nx)
修心 青山無所爭.福田用心耕(Nx)nonnon
 
Ooliinnguaq And Knud Peter
Ooliinnguaq And Knud PeterOoliinnguaq And Knud Peter
Ooliinnguaq And Knud Petereka
 
Anna Lena And Tupaarnaq
Anna Lena And TupaarnaqAnna Lena And Tupaarnaq
Anna Lena And Tupaarnaqeka
 
Webpage07 Sept07
Webpage07 Sept07Webpage07 Sept07
Webpage07 Sept07mapasy
 
Brievenbusreclame_2008
Brievenbusreclame_2008Brievenbusreclame_2008
Brievenbusreclame_2008Peter Wiegman
 
Fair Vote2
Fair Vote2Fair Vote2
Fair Vote2etnader
 
Una Ventana A La Historia
Una Ventana A La HistoriaUna Ventana A La Historia
Una Ventana A La HistoriaClota
 

Destacado (9)

修心 青山無所爭.福田用心耕(Nx)
修心  青山無所爭.福田用心耕(Nx)修心  青山無所爭.福田用心耕(Nx)
修心 青山無所爭.福田用心耕(Nx)
 
Ooliinnguaq And Knud Peter
Ooliinnguaq And Knud PeterOoliinnguaq And Knud Peter
Ooliinnguaq And Knud Peter
 
1stweek
1stweek1stweek
1stweek
 
Anna Lena And Tupaarnaq
Anna Lena And TupaarnaqAnna Lena And Tupaarnaq
Anna Lena And Tupaarnaq
 
Webpage07 Sept07
Webpage07 Sept07Webpage07 Sept07
Webpage07 Sept07
 
Brievenbusreclame_2008
Brievenbusreclame_2008Brievenbusreclame_2008
Brievenbusreclame_2008
 
Fair Vote2
Fair Vote2Fair Vote2
Fair Vote2
 
Una Ventana A La Historia
Una Ventana A La HistoriaUna Ventana A La Historia
Una Ventana A La Historia
 
MG427_3_SEO
MG427_3_SEOMG427_3_SEO
MG427_3_SEO
 

Similar a Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction

Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsriyaniaes
 
First principles models as a tool to accelerate innovation in the design and ...
First principles models as a tool to accelerate innovation in the design and ...First principles models as a tool to accelerate innovation in the design and ...
First principles models as a tool to accelerate innovation in the design and ...pablo-rolandi
 
Electric distribution network reconfiguration for power loss reduction based ...
Electric distribution network reconfiguration for power loss reduction based ...Electric distribution network reconfiguration for power loss reduction based ...
Electric distribution network reconfiguration for power loss reduction based ...IJECEIAES
 
Retinal blood vessel extraction and optical disc removal
Retinal blood vessel extraction and optical disc removalRetinal blood vessel extraction and optical disc removal
Retinal blood vessel extraction and optical disc removaleSAT Journals
 
Lasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope ProjectionLasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope ProjectionChester Chen
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachGualberto Asencio Cortés
 
Método Topsis - multiple decision makers
Método Topsis  - multiple decision makersMétodo Topsis  - multiple decision makers
Método Topsis - multiple decision makersLuizOlimpio4
 
An optimal design of current conveyors using a hybrid-based metaheuristic alg...
An optimal design of current conveyors using a hybrid-based metaheuristic alg...An optimal design of current conveyors using a hybrid-based metaheuristic alg...
An optimal design of current conveyors using a hybrid-based metaheuristic alg...IJECEIAES
 
Optimal rule set generation using pso algorithm
Optimal rule set generation using pso algorithmOptimal rule set generation using pso algorithm
Optimal rule set generation using pso algorithmcsandit
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...asahiushio1
 
A proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designsA proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designsAlexander Decker
 
Accelerating the ant colony optimization by
Accelerating the ant colony optimization byAccelerating the ant colony optimization by
Accelerating the ant colony optimization byijcsa
 
Model reduction-of-linear-systems-by conventional-and-evolutionary-techniques
Model reduction-of-linear-systems-by conventional-and-evolutionary-techniquesModel reduction-of-linear-systems-by conventional-and-evolutionary-techniques
Model reduction-of-linear-systems-by conventional-and-evolutionary-techniquesCemal Ardil
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...Cemal Ardil
 
Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...Reza Sadeghi
 
Optimization of Economic Load Dispatch Problem by GA and PSO - A Comparison
Optimization of Economic Load Dispatch Problem by GA and PSO - A ComparisonOptimization of Economic Load Dispatch Problem by GA and PSO - A Comparison
Optimization of Economic Load Dispatch Problem by GA and PSO - A ComparisonIRJET Journal
 
A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier  A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier IJECEIAES
 

Similar a Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction (20)

201977 1-1-3-pb
201977 1-1-3-pb201977 1-1-3-pb
201977 1-1-3-pb
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
 
First principles models as a tool to accelerate innovation in the design and ...
First principles models as a tool to accelerate innovation in the design and ...First principles models as a tool to accelerate innovation in the design and ...
First principles models as a tool to accelerate innovation in the design and ...
 
Electric distribution network reconfiguration for power loss reduction based ...
Electric distribution network reconfiguration for power loss reduction based ...Electric distribution network reconfiguration for power loss reduction based ...
Electric distribution network reconfiguration for power loss reduction based ...
 
Retinal blood vessel extraction and optical disc removal
Retinal blood vessel extraction and optical disc removalRetinal blood vessel extraction and optical disc removal
Retinal blood vessel extraction and optical disc removal
 
Lasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope ProjectionLasso Screening Rules via Dual Polytope Projection
Lasso Screening Rules via Dual Polytope Projection
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors Approach
 
Método Topsis - multiple decision makers
Método Topsis  - multiple decision makersMétodo Topsis  - multiple decision makers
Método Topsis - multiple decision makers
 
An optimal design of current conveyors using a hybrid-based metaheuristic alg...
An optimal design of current conveyors using a hybrid-based metaheuristic alg...An optimal design of current conveyors using a hybrid-based metaheuristic alg...
An optimal design of current conveyors using a hybrid-based metaheuristic alg...
 
23AFMC_Beamer.pdf
23AFMC_Beamer.pdf23AFMC_Beamer.pdf
23AFMC_Beamer.pdf
 
Optimal rule set generation using pso algorithm
Optimal rule set generation using pso algorithmOptimal rule set generation using pso algorithm
Optimal rule set generation using pso algorithm
 
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
2017-12, Keio University, Projection-based Regularized Dual Averaging for Sto...
 
A proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designsA proposed nth – order jackknife ridge estimator for linear regression designs
A proposed nth – order jackknife ridge estimator for linear regression designs
 
Accelerating the ant colony optimization by
Accelerating the ant colony optimization byAccelerating the ant colony optimization by
Accelerating the ant colony optimization by
 
Model reduction-of-linear-systems-by conventional-and-evolutionary-techniques
Model reduction-of-linear-systems-by conventional-and-evolutionary-techniquesModel reduction-of-linear-systems-by conventional-and-evolutionary-techniques
Model reduction-of-linear-systems-by conventional-and-evolutionary-techniques
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...A combined-conventional-and-differential-evolution-method-for-model-order-red...
A combined-conventional-and-differential-evolution-method-for-model-order-red...
 
Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...Strengthening support vector classifiers based on fuzzy logic and evolutionar...
Strengthening support vector classifiers based on fuzzy logic and evolutionar...
 
Optimization of Economic Load Dispatch Problem by GA and PSO - A Comparison
Optimization of Economic Load Dispatch Problem by GA and PSO - A ComparisonOptimization of Economic Load Dispatch Problem by GA and PSO - A Comparison
Optimization of Economic Load Dispatch Problem by GA and PSO - A Comparison
 
A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier  A genetic algorithm for the optimal design of a multistage amplifier
A genetic algorithm for the optimal design of a multistage amplifier
 

Más de jaumebp

Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningjaumebp
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformaticsjaumebp
 
The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9jaumebp
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...jaumebp
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learningjaumebp
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...jaumebp
 

Más de jaumebp (6)

Knowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learningKnowledge extraction and visualisation using rule-based machine learning
Knowledge extraction and visualisation using rule-based machine learning
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9The Infobiotics Contact Map predictor at CASP9
The Infobiotics Contact Map predictor at CASP9
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
 

Último

Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyotictsugar
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?Olivia Kresic
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 

Último (20)

Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Investment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy CheruiyotInvestment in The Coconut Industry by Nancy Cheruiyot
Investment in The Coconut Industry by Nancy Cheruiyot
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?MAHA Global and IPR: Do Actions Speak Louder Than Words?
MAHA Global and IPR: Do Actions Speak Louder Than Words?
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 

Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction

  • 1. Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara Sastry, Xavier Llorà and Natalio Krasnogor University of Nottingham and University of Illinois at Urbana-Champaign
  • 2. What is a protein?
  • 3. Protein Structure Prediction (PSP) The goal is to predict the (complex) 3D structure (and some sub- features) of a protein from its amino acid sequence (a 1D object) Primary Sequence 3D Structure
  • 4. Alphabet reduction process and validation Domain Size = N (<20) (CN, RSA, …) Test set Dataset Dataset Ensemble ECGA BioHEL Card=N Card=20 of rule sets (<20) Accuracy Mutual Information
  • 5. This entry is human competitive because: G: The result solves a problem of indisputable difficulty in its field (Difficult) D: The result is publishable in its own right as a new scientific result - independent of the fact that the result was mechanically created (Publishable) E: The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions (≥Human) B: The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal (Innovative)
  • 6. G:Difficulty PSP is, after many decades of research, still one of the main unsolved problems in Science In the 2006 CASP experiment, one of the best methods (Rosetta@home) used > 3 cpu yrs to predict a single protein Amino acid sequence is a string drawn from a 20-letter alphabet Some AAs are similar & could be grouped, reducing the dimensionality of the domain We can find a new alphabet with much lower cardinality than the AA representation without loosing critical information in the process We can tailor alphabet reduction automatically to a variety of PSP-related domains
  • 7. Why is this entry human- competitive? The initial version of our alphabet reduction process has been accepted in GECCO D:Publish. 2007, in the biological applications track One of the most famous alphabet reductions is the HP model that reduces AA types to only two: Hydrophobic & Polar (e.g. [Broome & Hecht, 2000]) E:≥Human Other experts use a broader set of physico- chemical properties to propose reduced alphabets (examples in later slides) We have improved upon both of the above
  • 8. B:Innovative Comparison of our results against other reduced alphabets existing in the literature and human-designed ones, applied to two PSP-related datasets, Coordination Number (CN) and Solvent Accessibility (SA) Our method produces the best reduced alphabets Alphabet Letters CN acc. SA acc. Diff. Ref. AA 20 74.0±0.6 70.7±0.4 --- --- Our method 5 73.3±0.5 70.3±0.4 0.7/0.4 This work WW5 6 73.1±0.7 69.6±0.4 0.9/1.1 [Wang & Wang, 99] Alphabets SR5 6 73.1±0.7 69.6±0.4 0.9/1.1 [Solis & Rackovsky, 00] from the MU4 5 72.6±0.7 69.4±0.4 1.4/1.3 [Murphy et al., 00] literature MM5 6 73.1±0.6 69.3±0.3 0.9/1.4 [Melo & Marti-Renom, 06] HD1 7 72.9±0.6 69.3±0.4 1.1/1.4 This work Expert HD2 9 73.0±0.6 69.3±0.4 1.0/1.4 This work designed alphabets HD3 11 73.2±0.6 69.9±0.4 0.8/0.8 This work
  • 9. Why is this entry better than the other entries? PSP is a very difficult and very relevant domain It has been named as Grand Challenge by the USA government [1] Impact of having better protein structure models are countless Genetic therapy Synthesis of drugs for incurable diseases Improved crops Environmental remediation Our contribution is a small but concrete step towards achieving this goal [1] Mathematical Committee on Physical, Engineering Engineering Sciences, Federal Coordinating Council for Science, and Technology. Grand challenges 1993: High performance computing and communications, 1992.
  • 10. Better than other entries: New understanding of the folding process Simpler rules obtained by BioHEL AA alphabet: If AA−4 ∉ {F, G, I, L, V,X, Y }, AA−3 ∉ {F, G, Q,W}, AA−2 ∉ {C,N, P}, AA−1 ∉ {A, I, Q, V, Y }, AA ∈ {K}, AA1 ∉ {F, I, L,M,N, P, T, V }, AA2 ∉ {N, P, Q, S}, AA3 ∉ {C, I, L,R,W}, AA4 ∉ {A,C, I, L,R, S} then AA is exposed Reduced alphabet: If AA−4 ∈ {1, 3}, AA−3 ∈ {1, 3}, AA ∈ {3}, AA1 ∈ {1, 3}, AA2 ∉ {1}, AA3 ∉ {0} then AA is exposed 0 = ACFHILMVWY, 1 = DEKNPQRST (EK for AA), 3 =X Unexpected explanations: Alphabet reduction clustered AA types that experts did not expect. Analyzing the data verified that groups were sound
  • 11. Better than other entries: run- time reduction & conclusions Alphabet reduction is also beneficial in the short term We have extrapolated the reduced alphabet to Position- Specific Scoring Matrices (PSSM) PSSM is the state-of-the-art representation for PSP with orders of magnitude more information than the AA alphabet Learning time of BioHEL using PSSM has been reduced from 2 weeks to 3 days with only 0.5% accuracy drop We consider that our entry is the best because it addresses successfully and in many ways a very relevant, important, high profile and timely problem