Dynamic programming

About optimal sequence alignment
A short glimpse into bioinformatics

April 24, 2010

1 / 23

Pairwise sequence alignment

Assumptions:
sequences S1 and S2 are homologous, they share a common
ancestor;
diﬀerences between them are due to only two kinds of events,
substitutions and insertion-deletions.
Strategy:
choose a scoring matrix (reward for match, penalty for
mismatch and gap);
compute the editing distance (number of matches,
mismatches and gaps) to go from one sequence to the other;
keep the alignment with the highest score.

2 / 23

Needleman-Wunsch algorithm

Aim: ﬁnd the optimal global alignment of sequences S1 and S2
Recursion rule:

D(i − 1, j − 1) + score(S1 [i], S2 [j])

D(i, j) = max D(i − 1, j) + gap (1)

D(i, j − 1) + gap


Scoring scheme: identity=0 transition=-2 transversion=-5
gap=-10
Sequences: S1 =TTGT S2 =CTAGG

3 / 23

Fill the matrix

C T A G G

T

T

G

T

4 / 23

Fill the matrix

C T A G G

0

T

T

G

T

5 / 23

Fill the matrix

C T A G G

0 → -10

T

T

G

T

6 / 23

Fill the matrix

C T A G G

0 → -10 → -20

T

T

G

T

7 / 23

Fill the matrix

C T A G G

0 → -10 → -20 → -30 → -40 → -50

T

T

G

T

8 / 23

Fill the matrix

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓
T -10
↓
T -20
↓
G -30
↓
T -40

9 / 23

Fill the matrix

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓
T -10 →
↓
T -20
↓
G -30
↓
T -40

10 / 23

Fill the matrix

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓
T -10 → -2
↓
T -20
↓
G -30
↓
T -40

11 / 23

Fill the matrix

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓
T -20
↓
G -30
↓
T -40

12 / 23

Fill the matrix

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → → → → →
↓ ↓ ↓ ↓ ↓ ↓
G -30 → → → → →
↓ ↓ ↓ ↓ ↓ ↓
T -40 → → → → →

13 / 23

Fill the matrix

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

14 / 23

Traceback

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

15 / 23

Traceback

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

16 / 23

Traceback

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

17 / 23

Traceback

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

18 / 23

Traceback

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

19 / 23

Traceback

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

20 / 23

Traceback

C T A G G

0 → -10 → -20 → -30 → -40 → -50
↓ ↓ ↓ ↓ ↓ ↓
T -10 → -2 → -10 → -20 → -30 → -40
↓ ↓ ↓ ↓ ↓ ↓
T -20 → -12 → -2 → -12 → -22 → -32
↓ ↓ ↓ ↓ ↓ ↓
G -30 → -22 → -12 → -4 → -12 → -22
↓ ↓ ↓ ↓ ↓ ↓
T -40 → -32 → -22 → -14 → -9 → -17

21 / 23

Output

Plot the optimal alignment:

CTAGG
*| |*
TT-GT

Score: -17
Complexity in time: O(nm)
Complexity in memory: O(nm)

22 / 23

Acknowledgments

Bellman; Levenstein; Needleman and Wunsch; Sankoﬀ and Sellers;
Hirschberg; Smith and Waterman; Gotoh; Ukkonen, Myers and
Fickett; and many others...

Want to know more? start reading!
http://lectures.molgen.mpg.de/online_lectures.html

23 / 23

Dynamic programming

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Dynamic programming