Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
A new multiple classifiers soft decisions fusion approach for exons prediction in dna sequences
1. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
A New Multiple Classifiers Soft Decisions Fusion
Approach
for Exons Prediction in DNA Sequences
Ismail M. El-Badawy, Ashraf M. Aziz, Senior Member, IEEE, Safa Gasser and Mohamed E. Khedr
Department of Electronics & Communications Engineering
Arab Academy for Science, Technology and Maritime Transport, Egypt
Presented by
Ismail M. El-Badawy
2. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Outline
Introduction
DNA Structure
Predicting Exons Locations
Exons Prediction using DFT
Proposed Soft Decisions Fusion Approach
Performance Evaluation
Conclusion
3. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Introduction
Digital Signal Processing has proved its success in different fields,
and bioinformatics is one of these fields.
Identification of protein coding regions in DNA sequences is one of
the important topics in biosignal processing and bioinformatics
area.
With the significant growth of sequenced genomic data, it has
become important to come up with computarized methods for
predicting these important protein coding regions (exons) in DNA
sequences.
4. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
DNA Structure
DNA, or deoxyribonucleic acid, is the hereditary material in humans
and almost all other organisms.
5. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
DNA Structure
Organisms can be categorized into
prokaryotes (e.g bacteria) and
eukaryotes (e.g human).
In both categories, DNA consists
of genes separated by intergenic
regions.
In eukaryotes, genes are further
divided into protein-coding
regions (exons) and noncoding regions (introns).
6. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
DNA Structure
DNA is made up of nucleotides.
Nucleotides are identified by the
four nitrogen bases.
Nitrogen bases pair up with each
other forming a double helix.
Adenine (A)
Thymine (T)
Cytosine (C)
Guanine (G)
The
two DNA strands are
complementary to each other.
7. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
DNA Structure
DNA = Chain of nucleotides {A, C, G and T}.
This DNA chain (Exons and introns) can symbolically be
represented by a character string of four alphabet letters.
………TCCGATCGATCGATCTCTCTAGCGTCTACGCTAT
CATCGCTCTCTATTATCGCGCGATCGTCGATCGCGCG
AGAGTATGCTACGTCGATCGAATTG …………………………
8. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
DNA Structure
Protein-Coding regions (Exons) are the portions in DNA that
contain the information for producing proteins.
9. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Predicting Exons Locations
Accurate prediction of the exons locations in DNA sequences is
an important issue for biologists since they are considered as
information bearing parts.
TATTCCGATCGATCGATCT
CTCTAGCGTCTACGCTATC
ATCGCTCTCTATTATCGCG
CG ……
Exons
finder
Exons Locations
10. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Predicting Exons Locations
The order of the nucleotides
stored in the Exons spell out a
code for protein synthesis.
Triplets of nucleotides (codons)
in the exonic segments of DNA
specify each type of amino acid
based on a genetic code.
Each amino acid is encoded by one
or more codons (many to one
mapping).
11. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Predicting Exons Locations
It was shown in previous publications that exonic parts exhibit a
period-3 property due to the codon structure and the nonuniform usage of codons in exonic regions.
This periodicity is absent outside the exonic segments.
……… ACGTATTCCGATCGA …………… GACTCTAGCGTCTAC ………
12. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Predicting Exons Locations
Three main steps to predict exons locations using digital signal
processing (DSP) tool.
…TATTCCGATCGATCGATCTCTCTAGCGTCTAC
GCTATCATCGCTCTCTATTATCGCGCG ……
Symbolic to
Numeric Mapping
Track the strength
of the period-3
component using
DSP tool
Decision Making
Exons Locations
13. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Exons Prediction using DFT
Sliding window DFT is one of various DSP methods previously
proposed in the filed of exons prediction based on DNA spectral
analysis.
…TATTCCGATCGATCGATCTCTCTAGCGTCTAC
GCTATCATCGCTCTCTATTATCGCGCG ……
Numerical
Mapping
X[n]
Sliding
Window
DFT
S[L/3]
Exons Locations
14. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Exons Prediction using DFT
Calculating the power spectrum of a windowed DNA
numerical sequence at k=L/3 is sufficient as it is expected to be
large value in exonic regions and small value outside.
…TATTCCGATCGATCGATCTCTCTAGCGTCTAC
GCTATCATCGCTCTCTATTATCGCGCG ……
Numerical
Mapping
X[n]
Sliding
Window
DFT
S[L/3]
15. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Exons Prediction using DFT
A hard decision for each nucleotide (exonic or intronic
nucleotide) is made according to the corresponding S[L/3] value,
whether it is above or below a decision threshold.
S[L/3]
Exons Locations
16. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Exons Prediction using DFT
In our work, we selected two symbolic-to-numeric mapping
schemes from different schemes that previously showed a
reasonable performance.
…TATTCCGATCGATCGATCTCTCTAGCGTCTAC
GCTATCATCGCTCTCTATTATCGCGCG ……
Nucleotide
Numerical
Mapping
CIS
Adenine (A)
0.1260
1
Cytosine (C)
X[n]
EIIP
0.1340
-j
Guanine (G)
0.0806
-1
Thymine (T)
0.1335
j
18. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Exons Prediction using DFT
Gene F56F11.4 contains
five exons
Each mapping scheme is able
pronounce the peaks in
some exonic segments than the
other scheme.
The
peaks in the exonic
segments are not always
consistently large while
those in the intronic segments
are not always consistently low.
1
0.5
0
0
1000
2000
3000
4000
5000
Nucleotide Positions
6000
7000
8000
0
1000
2000
3000
4000
5000
Nucleotide Positions
6000
7000
8000
1
0.5
0
19. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Proposed Soft Decisions Fusion Approach
…TATTCCGATCGATCGAT…CTCTC…TAGCGTCT
ACGCTATCATCGCTCTCT…ATTATCGCGCG ……
EIIP
Mapping
X[n]
Sliding
Window
DFT
S[L/3]
Soft Decisions
CIS
Mapping
X[n]
Sliding
Window
DFT
S[L/3]
20. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Proposed Soft Decisions Fusion Approach
Hard Decision (0 or 1)
Soft Decision (0 to 1)
21. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Proposed Soft Decisions Fusion Approach
Each nucleotide belongs to exonic regions with a partial
S[L/3]
membership value (i.e possibility of being an exonic nucleotide).
22. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Proposed Soft Decisions Fusion Approach
…TATTCCGATCGATCGAT…CTCTC…TAGCGTCT
ACGCTATCATCGCTCTCT…ATTATCGCGCG ……
EIIP
Mapping
X[n]
Sliding
Window
DFT
S[L/3]
DFC
CIS
Mapping
X[n]
Sliding
Window
DFT
S[L/3]
Exons Locations
23. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Proposed Soft Decisions Fusion Approach
The DFC averages the two local soft decisions.
If the average exceeds 0.5 (i.e the average possibility of being
an exonic nucleotide exceeds 50% ),
the final decision is ‘1’,
otherwise ‘0’.
The combined decision
Soft Decisions
helps in making a more
reliable decision as compared to making
a hard decision depending on only one classifier.
DFC
Exons Locations
24. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation Metrics
True
Prediction
Decision
Positive
Negative
False
True
False
25. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation Metrics
True
Prediction
Decision
Positive
Negative
False
True
False
26. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation Metrics
27. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation Metrics
28. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation Metrics
Area under the ROC curve (AUC) is a good indicator.
29. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation Metrics
F_measure Vs Decision threshold is also a good indicator.
30. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation
MATLAB Simulation is conducted on real data (HMR195
dataset) which is available online.
It contains 195 mammalian sequences consisting of 43 single-
exon and 152 multi-exon genes.
Traditional and proposed approaches are simulated using
different window shapes with a constant length (L=351) as
reported in previous publications.
31. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation
AUC values for HMR195 dataset and ROC curves plotted in
case of using Bartlett window.
Window
Shape
Single Classifier
EIIP
CIS
Multiple
Classifier
Rectangular
0.7280
0.7398
0.7862
Nutall
0.7264
0.7439
0.7972
Parzen
0.7281
0.7457
0.7989
Bohman
0.7314
0.7490
0.8021
Blackman
0.7331
0.7504
0.8035
Hanning
0.7387
0.7553
0.8079
Hamming
0.7425
0.7580
0.8106
Bartlett
0.7438
0.7589
0.8115
32. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation
Numerical
Scheme
used by the
classifier
Number of
Classifiers
EIIP
% of exonic nucleotides detected as true
positives
at 10% FPR
at 20% FPR
at 30% FPR
1
43.5
56.9
66.4
CIS
1
46.8
59.9
68.7
Both
2
54.1
67.3
76.0
At 10% FPR:
by 24.4 % over single classifier
using EIIP
by 15.6 % over single classifier
using CIS
33. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Performance Evaluation
Maximum
F_measures achieved and corresponding
decision thresholds for HMR195 dataset.
Single Classifier
EIIP
CIS
Multiple
Classifier
Maximum
F_measure
0.4287
0.4562
0.5086
Decision
Threshold
0.029
0.048
0.037
by 18.6 % over single classifier
using EIIP
by 11.5 % over single classifier
using CIS
34. 2013 IEEE International Conference on Signal and Image Processing Applications (ICSIPA 2013)
Conclusion
In our work, a new multiple DFT-based classifiers approach for exons
prediction has been proposed.
Making soft decisions instead of hard decisions and depending on two
classifiers instead of one helps in making more reliable decisions.
The prediction accuracy is enhanced at the expense of increasing
computational time and complexity.
Although the analysis of the proposed approach has been investigated in
case of only two classifiers for simplicity, it can be easily be extended to
more than two classifiers.