1. About optimal sequence alignment
A short glimpse into bioinformatics
April 24, 2010
1 / 23
2. Pairwise sequence alignment
Assumptions:
sequences S1 and S2 are homologous, they share a common
ancestor;
differences between them are due to only two kinds of events,
substitutions and insertion-deletions.
Strategy:
choose a scoring matrix (reward for match, penalty for
mismatch and gap);
compute the editing distance (number of matches,
mismatches and gaps) to go from one sequence to the other;
keep the alignment with the highest score.
2 / 23
3. Needleman-Wunsch algorithm
Aim: find the optimal global alignment of sequences S1 and S2
Recursion rule:
D(i − 1, j − 1) + score(S1 [i], S2 [j])
D(i, j) = max D(i − 1, j) + gap (1)
D(i, j − 1) + gap
Scoring scheme: identity=0 transition=-2 transversion=-5
gap=-10
Sequences: S1 =TTGT S2 =CTAGG
3 / 23
22. Output
Plot the optimal alignment:
CTAGG
*| |*
TT-GT
Score: -17
Complexity in time: O(nm)
Complexity in memory: O(nm)
22 / 23
23. Acknowledgments
Bellman; Levenstein; Needleman and Wunsch; Sankoff and Sellers;
Hirschberg; Smith and Waterman; Gotoh; Ukkonen, Myers and
Fickett; and many others...
Want to know more? start reading!
http://lectures.molgen.mpg.de/online_lectures.html
23 / 23