SlideShare a Scribd company logo
1 of 37
Download to read offline
Validation of Time Series Technique
   for Prediction of Conformational
   States of Amino Acids




Dr. Sangeeta Sawant , Bioinformatics Centre, UoP, Pune (Guide)

Dr. Mohan Kale, Dept. of Statistics, UoP, Pune (co-guide)
Concepts Used
              Ramachandran Plot

                  Time series

           AR,ARMA,ARIMA models

                  AIC criteria

              Euclidean distance

        Potential values for AA residues

      Feynman Problem Solving Algorithm
Ramachandran Plot
Time Series
a sequence of data points or set of observations, measured
typically at successive time instants spaced at uniform time
intervals.

                                                 Patterns, variations


                                                 forecasting
Time Series Models (probability model)


Autoregressive (AR) models


Autoregressive-moving average (ARMA)


Autoregressive integrated moving average (ARIMA)
models

- depend linearly on previous data points
Materials & Methods
                              R

                       R-Studio, Tinn-R

            bio3d,itsmr,forecast,tseries,timsac,wordcloud

                   ITSM_2000- Standalone

 R Nabble
 BioStars
 stats.stackexchange
Methods

A) Calculation of Potential values for AA
residues

B)Forecasting of AA states


C) Clustering
Calculation of Potential values for AA residues

                                                                     Dataset-I

 3829 proteins selected from PDB (Protein Data Bank) –PDBSelect dataset list(25 %
                                 seq. similarity)


         Expt. method-X-ray, R-factor: - 0-0.25 (for best resolved structures)



                             Chain breaks, only CA atoms


Phi-Psi values –torsion.pdb() of “bio3d” & verified via PDBGoodies (IISC, Bangalore) &
                       Protein Angle Descriptor utility (IIT, Delhi )


 Assignment of Conformational state 1, 2, or 3 - to regions I, II, or III of the Rama.
                 Plot, to each amino-acid residue (Phi_psi values)
ᵠ




                                      ᶲ
Figure No- 2 Ramachandran plot showing three conformational regions I ,II and III

   I- closely/tightly packed conformations, Phi-140 to 0,Psi -100 to 0
   II-extended conformations, Phi -180 to 0, Psi 80 to 180
   III- all remaining confirmations
Frequencies of single residues in three states calculated
& normalized using (Kolaskar, A.S. & Sawant, S.V. -1996 )



                                   nik N
                           Pik =
                                  nik  nik
Nik –no. of times the AA of type (i) occurs in state k=1-3;
N -total no. of residues
Pik -potential values of AA of type (i) in state k

 Potential values in pdf
Potential values
Time Series
ACF Plot
ACF –Stat Vs. Non-stationary
Stationary




                     Non-stationary
Time Series
      ACF plot

                      Stationary



                     Non-
                  stationary




                 Stationary
Stationary TS
TS model building…..

            AR (p)



            ARMA(p,q)



      ARIMA (p,q)
Best model Selection

                     AR (p)

                  ARMA (p, q)

                  ARIMA (p, q)

            AIC
Forecasting of AA states for best models
Forecasting of AA states for best models….

e.g. for AR(1) process,



X t = φ X (t-1) + Z (t), t=0,± 1,….



Where {Z t}~ WN (0, s2) & | φ | <1


  1st observed potential for AA with index given as data points & t
   respectively, prediction starts from 2nd position up to last index
                        using forecast() “itsmr”
Similarly for ARMA (1,1) /ARIMA (1,1)


X t = φ X (t-1) + Z (t) + θ Z (t-1),      θ+φ


Forecasting Quality by coefficient of determination (R2)
using formula


                      R =1
                         2             (Yi  Fi )2
                                        (Yi  Y )2

    Yi =True value /Observed value
    Fi = Forecasted/predicted value
Clustering
                      Dataset-II

SCOP Domain specific PDB-style files(ATOM & HETATM records )
downloaded from


ASTRAL Compendium for Sequence and Structure Analysis -
release 1.75 (June 2009)


Scan for chain breaks & presence of CA atoms only, breaked files
kept aside
Length of AA residues(100-110) e.g.
10gsa1_a_133_pot.txt

   File
Potential values (Time series),each domain divided into
stationary (506) & non-stationary process (1692)

Non-stationary data kept aside for further
transformations

AR,ARMA & ARIMA models


Best model (minimum AIC criteria)


Best-AR(22),ARMA(484),ARIMA(No model)


AR(p), ARMA(p,q) -distance matrix (Euclidean distance )


Dendrogram-Neighbour-joing ( Phylip packages)
Dendrogram_TS –AR models-22
Dendrogram_TS –ARMA models-484

• Phylowidget link
Results & Discussion

For each AA of all the proteins, 3D-
Cartesian co-ordinates were transformed
into 2D info. i.e. conformational states of
AA and potential values were computed
and used to build time-distance (index of
AA) dependent statistical model as time
series for forecasting purposes.
AR values


            Autoregressive order (p)  1-18 range

            Short & long range dependence  variations
            in protein structural arrangements

            Variations proves  diversity exhibits
            through structural components
Table No. II – Forecasting results for AR models (44) out of best
 90 models (Note- for 46 models, class information not found in
 SCOP database) All values are in % accuracy

      All  (a)-12   All  (b)-5   /  (c)-9     +  (d)-13   Small    Coiled-coil Designed
                                                                proteins (h)-3        proteins
                                                                (g)-1                 (k)-1

      Max    Min     Max     Min   Max     Min   Max    Min              Max Min
AA    26.82 2.41     16.30 8.88    27.77 1.47    28.57 7.04     19.51    22.5 5.88 29.03
seq
(%)
States 55.68 21.77 51.11 44.76 54.76 30.64 51.70 19.04 48.78             26      15   26.88
(%)


Conformational states accuracy > AA residues accuracy due to low
resolution of potential values(forecasted values)
Table No. III– Forecasting results for ARMA models (557) out of best 1239
   models (Note- for 682 models, class information not found in SCOP
   database) —All values are in % accuracy

         All  (a)-123 All  (b)-146    /  (c)-120    +  (d)-127   Multi domains Membrane &        Small
                                                                       proteins (e)-13 cell surface    proteins(g)-
                                                                                       (f)-3           17



         Max     Min    Max     Min     Max     Min    Max     Min     Max     Min     Max     Min     Max    Min


AA       32.55   2.63   32.81   3.96    43.47   5      37.96   2.70    24.39   6.034   12.65   7.01    30.64 6.60
seq
(%)

States   65.77   8.06   65.01   17.94   62.89   8.97   68.15   11.11   50      17.80   34.33   11.42   64.51 14.28
(%)




Due to non-representative dataset & inadequate info. about class, can’t say
that for any particular class i) pred. accuracy ↑ or ↓ & ii) follows mostly
ARMA process
Discussion
TS graphs opens new door in scientific visualization of proteins (no 3D str. info) i.e.
specific AA can be visualized on line plot with its value proportional to frequency to
occur into allowed regions of Ramachandran plot.



Potential value for each AA adds new feature of selection in machine learning
techniques.




Order of AR model tells how current value linearly related to past p value




Intra-dependency of AA shown using models of TS e.g. AR(4),ARMA(1,3)
CONCLUSIONS
Found new way of looking at protein structure
prediction.

Application of TS technique for predicting conformational states based on the
conformational state potentials instead of secondary str. has been attempted.

Accuracy of prediction of conformational states for AA, using time series is
higher than that for prediction of AA residues.

To increase accuracy for prediction, multivariate time series concept may be
useful instead of uni-variate time series

Intra-fluctuations inside proteins, due to AA arrangement can be traced out
by stationary & non-stationary groups
FUTURE WORK
AR and MA order of TS models -as point of genetic information (distances) to
predict evolutionary relationship between different proteins.


TS concept can be used to predict conformational states of missing residues
in PDB data files


Hierarchical clustering/classification of TS of proteins -birth to new concept
of time dependent clustering (pseudo-clustering) & pseudo-phylogeny.


Development of synthetic proteins to combat seasonal diseases & to tackle
chemical warfare attacks.


TS fluctuations for specific class of proteins can be used as “Pattern” for data
analysis and pattern-dependent classification of proteins
References

Blundell TL, Sibanda BL, Sternberg MJ, Thornton JM. Knowledge-
based prediction of protein structures and the design of novel
molecules. Nature. 1987 Mar 26-Apr 1;326(6111):347-52. Review

Kolaskar, A.S., Sawant, S.V. (1996). Prediction of conformational
states of amino acids using a Ramachandran plot. Int.J.Peptide
Protein Res.110-116

Alessandro G.,Romualdo B.,(2000). Nonlinear Methods in the
Analysis of Protein Sequences:A Case Study in Rubredoxins.
Biophysical Journal.136-148
Questions
Thank You !

More Related Content

Viewers also liked

Relat%c3%b3rio%20 final%20fgv%20sp
Relat%c3%b3rio%20 final%20fgv%20spRelat%c3%b3rio%20 final%20fgv%20sp
Relat%c3%b3rio%20 final%20fgv%20sparnaldoromera
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataHopeBay Technologies, Inc.
 
Forecasting Techniques - Data Science SG
Forecasting Techniques - Data Science SG Forecasting Techniques - Data Science SG
Forecasting Techniques - Data Science SG Kai Xin Thia
 
Automatic algorithms for time series forecasting
Automatic algorithms for time series forecastingAutomatic algorithms for time series forecasting
Automatic algorithms for time series forecastingRob Hyndman
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingKai Xin Thia
 
Specialty packaging corporation, part a
Specialty packaging corporation, part aSpecialty packaging corporation, part a
Specialty packaging corporation, part aaliyudhi_h
 

Viewers also liked (6)

Relat%c3%b3rio%20 final%20fgv%20sp
Relat%c3%b3rio%20 final%20fgv%20spRelat%c3%b3rio%20 final%20fgv%20sp
Relat%c3%b3rio%20 final%20fgv%20sp
 
A General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series DataA General Framework for Enhancing Prediction Performance on Time Series Data
A General Framework for Enhancing Prediction Performance on Time Series Data
 
Forecasting Techniques - Data Science SG
Forecasting Techniques - Data Science SG Forecasting Techniques - Data Science SG
Forecasting Techniques - Data Science SG
 
Automatic algorithms for time series forecasting
Automatic algorithms for time series forecastingAutomatic algorithms for time series forecasting
Automatic algorithms for time series forecasting
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
 
Specialty packaging corporation, part a
Specialty packaging corporation, part aSpecialty packaging corporation, part a
Specialty packaging corporation, part a
 

Similar to BIM_2010_20_Bioinformatics_Project

Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...BRNSS Publication Hub
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...jaumebp
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GAHocine Merabti
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictioncsandit
 
de Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Readsde Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long ReadsSikder Tahsin Al-Amin
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...Salford Systems
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptgrssieee
 
parametric method of power spectrum Estimation
parametric method of power spectrum Estimationparametric method of power spectrum Estimation
parametric method of power spectrum Estimationjunjer
 
11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic productAlexander Decker
 
11.a seasonal arima model for nigerian gross domestic product
11.a seasonal arima model for nigerian gross domestic product11.a seasonal arima model for nigerian gross domestic product
11.a seasonal arima model for nigerian gross domestic productAlexander Decker
 
11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic productAlexander Decker
 

Similar to BIM_2010_20_Bioinformatics_Project (20)

Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
Investigation of Parameter Behaviors in Stationarity of Autoregressive and Mo...
 
04_AJMS_288_20.pdf
04_AJMS_288_20.pdf04_AJMS_288_20.pdf
04_AJMS_288_20.pdf
 
ETSATPWAATFU
ETSATPWAATFUETSATPWAATFU
ETSATPWAATFU
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GA
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
AR model
AR modelAR model
AR model
 
Lgm saarbrucken
Lgm saarbruckenLgm saarbrucken
Lgm saarbrucken
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series prediction
 
de Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Readsde Bruijn Graph Construction from Combination of Short and Long Reads
de Bruijn Graph Construction from Combination of Short and Long Reads
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
 
parametric method of power spectrum Estimation
parametric method of power spectrum Estimationparametric method of power spectrum Estimation
parametric method of power spectrum Estimation
 
Template attack versus Bayes classifier
Template attack  versus Bayes classifierTemplate attack  versus Bayes classifier
Template attack versus Bayes classifier
 
11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product
 
11.a seasonal arima model for nigerian gross domestic product
11.a seasonal arima model for nigerian gross domestic product11.a seasonal arima model for nigerian gross domestic product
11.a seasonal arima model for nigerian gross domestic product
 
11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product11.[1 11]a seasonal arima model for nigerian gross domestic product
11.[1 11]a seasonal arima model for nigerian gross domestic product
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

BIM_2010_20_Bioinformatics_Project

  • 1. Validation of Time Series Technique for Prediction of Conformational States of Amino Acids Dr. Sangeeta Sawant , Bioinformatics Centre, UoP, Pune (Guide) Dr. Mohan Kale, Dept. of Statistics, UoP, Pune (co-guide)
  • 2. Concepts Used Ramachandran Plot Time series AR,ARMA,ARIMA models AIC criteria Euclidean distance Potential values for AA residues Feynman Problem Solving Algorithm
  • 4. Time Series a sequence of data points or set of observations, measured typically at successive time instants spaced at uniform time intervals. Patterns, variations forecasting
  • 5. Time Series Models (probability model) Autoregressive (AR) models Autoregressive-moving average (ARMA) Autoregressive integrated moving average (ARIMA) models - depend linearly on previous data points
  • 6. Materials & Methods R R-Studio, Tinn-R bio3d,itsmr,forecast,tseries,timsac,wordcloud ITSM_2000- Standalone R Nabble BioStars stats.stackexchange
  • 7. Methods A) Calculation of Potential values for AA residues B)Forecasting of AA states C) Clustering
  • 8. Calculation of Potential values for AA residues Dataset-I 3829 proteins selected from PDB (Protein Data Bank) –PDBSelect dataset list(25 % seq. similarity) Expt. method-X-ray, R-factor: - 0-0.25 (for best resolved structures) Chain breaks, only CA atoms Phi-Psi values –torsion.pdb() of “bio3d” & verified via PDBGoodies (IISC, Bangalore) & Protein Angle Descriptor utility (IIT, Delhi ) Assignment of Conformational state 1, 2, or 3 - to regions I, II, or III of the Rama. Plot, to each amino-acid residue (Phi_psi values)
  • 9. ᶲ Figure No- 2 Ramachandran plot showing three conformational regions I ,II and III I- closely/tightly packed conformations, Phi-140 to 0,Psi -100 to 0 II-extended conformations, Phi -180 to 0, Psi 80 to 180 III- all remaining confirmations
  • 10. Frequencies of single residues in three states calculated & normalized using (Kolaskar, A.S. & Sawant, S.V. -1996 ) nik N Pik =  nik  nik Nik –no. of times the AA of type (i) occurs in state k=1-3; N -total no. of residues Pik -potential values of AA of type (i) in state k Potential values in pdf
  • 12.
  • 15. ACF –Stat Vs. Non-stationary Stationary Non-stationary
  • 16. Time Series ACF plot Stationary Non- stationary Stationary
  • 18. TS model building….. AR (p) ARMA(p,q) ARIMA (p,q)
  • 19. Best model Selection AR (p) ARMA (p, q) ARIMA (p, q) AIC
  • 20. Forecasting of AA states for best models
  • 21. Forecasting of AA states for best models…. e.g. for AR(1) process, X t = φ X (t-1) + Z (t), t=0,± 1,…. Where {Z t}~ WN (0, s2) & | φ | <1 1st observed potential for AA with index given as data points & t respectively, prediction starts from 2nd position up to last index using forecast() “itsmr”
  • 22. Similarly for ARMA (1,1) /ARIMA (1,1) X t = φ X (t-1) + Z (t) + θ Z (t-1), θ+φ Forecasting Quality by coefficient of determination (R2) using formula R =1 2  (Yi  Fi )2  (Yi  Y )2 Yi =True value /Observed value Fi = Forecasted/predicted value
  • 23. Clustering Dataset-II SCOP Domain specific PDB-style files(ATOM & HETATM records ) downloaded from ASTRAL Compendium for Sequence and Structure Analysis - release 1.75 (June 2009) Scan for chain breaks & presence of CA atoms only, breaked files kept aside
  • 24. Length of AA residues(100-110) e.g. 10gsa1_a_133_pot.txt File
  • 25. Potential values (Time series),each domain divided into stationary (506) & non-stationary process (1692) Non-stationary data kept aside for further transformations AR,ARMA & ARIMA models Best model (minimum AIC criteria) Best-AR(22),ARMA(484),ARIMA(No model) AR(p), ARMA(p,q) -distance matrix (Euclidean distance ) Dendrogram-Neighbour-joing ( Phylip packages)
  • 28. Results & Discussion For each AA of all the proteins, 3D- Cartesian co-ordinates were transformed into 2D info. i.e. conformational states of AA and potential values were computed and used to build time-distance (index of AA) dependent statistical model as time series for forecasting purposes.
  • 29. AR values Autoregressive order (p)  1-18 range Short & long range dependence  variations in protein structural arrangements Variations proves  diversity exhibits through structural components
  • 30. Table No. II – Forecasting results for AR models (44) out of best 90 models (Note- for 46 models, class information not found in SCOP database) All values are in % accuracy All  (a)-12 All  (b)-5 /  (c)-9  +  (d)-13 Small Coiled-coil Designed proteins (h)-3 proteins (g)-1 (k)-1 Max Min Max Min Max Min Max Min Max Min AA 26.82 2.41 16.30 8.88 27.77 1.47 28.57 7.04 19.51 22.5 5.88 29.03 seq (%) States 55.68 21.77 51.11 44.76 54.76 30.64 51.70 19.04 48.78 26 15 26.88 (%) Conformational states accuracy > AA residues accuracy due to low resolution of potential values(forecasted values)
  • 31. Table No. III– Forecasting results for ARMA models (557) out of best 1239 models (Note- for 682 models, class information not found in SCOP database) —All values are in % accuracy All  (a)-123 All  (b)-146 /  (c)-120  +  (d)-127 Multi domains Membrane & Small proteins (e)-13 cell surface proteins(g)- (f)-3 17 Max Min Max Min Max Min Max Min Max Min Max Min Max Min AA 32.55 2.63 32.81 3.96 43.47 5 37.96 2.70 24.39 6.034 12.65 7.01 30.64 6.60 seq (%) States 65.77 8.06 65.01 17.94 62.89 8.97 68.15 11.11 50 17.80 34.33 11.42 64.51 14.28 (%) Due to non-representative dataset & inadequate info. about class, can’t say that for any particular class i) pred. accuracy ↑ or ↓ & ii) follows mostly ARMA process
  • 32. Discussion TS graphs opens new door in scientific visualization of proteins (no 3D str. info) i.e. specific AA can be visualized on line plot with its value proportional to frequency to occur into allowed regions of Ramachandran plot. Potential value for each AA adds new feature of selection in machine learning techniques. Order of AR model tells how current value linearly related to past p value Intra-dependency of AA shown using models of TS e.g. AR(4),ARMA(1,3)
  • 33. CONCLUSIONS Found new way of looking at protein structure prediction. Application of TS technique for predicting conformational states based on the conformational state potentials instead of secondary str. has been attempted. Accuracy of prediction of conformational states for AA, using time series is higher than that for prediction of AA residues. To increase accuracy for prediction, multivariate time series concept may be useful instead of uni-variate time series Intra-fluctuations inside proteins, due to AA arrangement can be traced out by stationary & non-stationary groups
  • 34. FUTURE WORK AR and MA order of TS models -as point of genetic information (distances) to predict evolutionary relationship between different proteins. TS concept can be used to predict conformational states of missing residues in PDB data files Hierarchical clustering/classification of TS of proteins -birth to new concept of time dependent clustering (pseudo-clustering) & pseudo-phylogeny. Development of synthetic proteins to combat seasonal diseases & to tackle chemical warfare attacks. TS fluctuations for specific class of proteins can be used as “Pattern” for data analysis and pattern-dependent classification of proteins
  • 35. References Blundell TL, Sibanda BL, Sternberg MJ, Thornton JM. Knowledge- based prediction of protein structures and the design of novel molecules. Nature. 1987 Mar 26-Apr 1;326(6111):347-52. Review Kolaskar, A.S., Sawant, S.V. (1996). Prediction of conformational states of amino acids using a Ramachandran plot. Int.J.Peptide Protein Res.110-116 Alessandro G.,Romualdo B.,(2000). Nonlinear Methods in the Analysis of Protein Sequences:A Case Study in Rubredoxins. Biophysical Journal.136-148