Pro-Gyan is a software that builds binary classifiers from protein sequences in fasta format. It calculates thousands of properties from the protein sequences and selects a subset of relevant features to build an SVM classifier through cross-validation. Users can build their own classifiers by providing positive and negative training data in fasta format, running the self-learning process, and evaluating the classifier's performance on independent test data. Built classifiers can be exported and used to classify novel protein sequences.
2. What is Pro-Gyan
o It builds binary classifier directly from protein sequences
Calculates ~5000 different properties from proteins
seuence
Selects a “maximal relevant and minimal redundant
feature subset” and ranked them applying Information
theory.
Top ranked features are selected to build the final
SVM classifier by 5 fold cross validation.
3. How to use Pro-Gyan
• Download “Pro_Gyan_1.0.zip” from
(https://code.google.com/p/pro-gyan/downloads/list)
• Extract all the files.
• Double click Pro-Gyan.jar which will open the main
window of “Pro-Gyan”.
• Let us build a classifier.
4. How to build a protein classifier
1. To build a classifier we need two set of
proteins (like mitochondrial and non-
mitochondrial) in fasta format.
2. Now press “Create Classifier”
6. How to build a protein classifier
•Give a name to your classier
•Add description about the classifier and data set.
•Labeled the positive and negative input data
appropriately
•Browse the fasta files and press “Save” button
7. How to build a protein classifier
Now Pro-Gyan is ready to build a new classifer; press “Self Learn”;
it will take some time depending on the data size
13. Export the classifier
The classifier could be export/save in “Pro-Gyan
classifier“ (pgc) format and upload in a web-server,
e-mailed, etc. The name and description of
the classifier could be updated at the time of export.
17. Classify novel proteins
• Copy-paste in the text
area or upload (“Fasta
File” button) multiple
protein sequence in fasta
format and “Classify”
them.
18. Prediction result
• The result is displayed in
tabular format which
could be copy paste to
any text or spreadsheet.