SlideShare a Scribd company logo
1 of 54
Download to read offline
GARY BENSON, YOZEN HERNANDEZ, &
JOSHUA LOVING
BIOINFORMATICS PROGRAM
B O S T O N U N I V E R S I T Y
J L O V I N G @ B U . E D U
A Bit-Parallel,
General Integer-Scoring
Sequence Alignment Algorithm
Introduction: Problem Description
Input:
• Sequences X and Y
• Integer weights M; I; G
match; mismatch; indel or gap that define a
similarity or distance scoring function S
Output:
• Calculate the global alignment score for X and Y
Introduction
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
Global Alignment – Needleman-Wunsch
Alignment Scoring Matrix
Sequence X
SequenceY
Introduction
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
Introduction
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
Penalty from beginning
Introduction
- A C T G C A A
- 0 0 0 0 0 0 0 0
A 0 2 -3 -3 -3 -3 2 2
G 0 -3 -1 -6 -1 -6 -3 -1
T 0 -3 -6 1 -4 -4 -8 -6
C 0 -3 -1 -4 -2 -2 -7 -11
A 0 2 -3 -9 -7 -5 0 -5
A 0 2 -1 -6 -7 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
No initial Penalty
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16
T -15
C -20
A -25
A -30
Needleman-Wunsch Alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30
Bit-parallel alignment
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5
G -10
T -15
C -20
A -25
A -30
Match = 2, Mismatch = -3, Indel = -5
Integer Scores
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Motivation
 Cheaper
sequencing of
DNA means that
larger datasets
are being
generated
 Sequence
analysis of such
large datasets
can be
accelerated by
faster alignment
algorithms
Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program
(GSP) Available at: www.genome.gov/sequencingcosts. Accessed June 10, 2013.
Earlier Bit-parallel Pattern Matching Algorithms
 Longest Common Subsequence (LCS) (Allison & Dix, 1986;
Crochemore et al, 2001; Hyyro, 2004)
 Unit-cost edit-distance (Myers, 99; Hyyro et al, 2005)
 K-differences (agrep; Wu-Manber, 92)
 Regular expression search (Navarro, 04)
 Arbitrary weights edit-distance (Bergeron&Hamel, 02)
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
Algorithm Foundation
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match = 2, Mismatch = -3, Indel = -5
-3 -8
-1 -6
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
-3 -8
-1 -6
-5
2 2
-5
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
-3 -8
-1 -6
∆H
∆V ∆VNEXT
∆HNEXT
-5
2 2
-5
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
∆H
-5
∆V -
-5
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
-3 -8
-1 -6
-5
2 2
-5
∆H
∆V ∆VNEXT
∆HNEXT
Algorithm Foundation
Match = 2, Mismatch = -3, Indel = -5
Output
Input
Function Table
∆VNEXT output values given ∆V and ∆H
input values
What is the range of differences?
What is the range of differences?
Match = 2,
Mismatch = -3,
Indel = -5
What is the range of differences?
Match = 2,
Mismatch = -3,
Indel = -5
Minimum Value = Indel = -5
Maximum Value = Match – Indel = 2 - (- 5) = 7
Generalized Function Table
M = match score
I = mismatch score
G = indel (gap) penalty
Zones in Example Function Table
Bit-parallel Representation
 Bit-vectors: computer words 64 bits long
 A bit-vector for each possible difference, both horizontally
and vertically (∆V and ∆H)
 A set of Match vectors (MatchA, MatchC, MatchG, MatchT
in the DNA case)
 We keep track of match positions because they are a special
case in the function table.
Example ∆H Bit-vector Storage
- A C T G C A A
C -20 -13 -6 -4 -2 -2 -7 -12
∆H values
7 7 2 2 0 -5 -5
∆H Bit-Vectors
7 1 1 0 0 0 0 0
6 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0
2 0 0 1 1 0 0 0
1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
-1 0 0 0 0 0 0 0
-2 0 0 0 0 0 0 0
-3 0 0 0 0 0 0 0
-4 0 0 0 0 0 0 0
-5 0 0 0 0 0 1 1
Example Match Vectors
- A C T G C A A
- 0 -5 -10 -15 -20 -25 -30 -35
A -5 2 -3 -8 -13 -18 -23 -28
G -10 -3 -1 -6 -6 -11 -16 -21
T -15 -8 -6 1 -4 -9 -14 -19
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
A -30 -23 -16 -14 -12 -10 -3 2
Match Vectors
MatchesA 1 0 0 0 0 1 1
MatchesC 0 1 0 0 1 0 0
MatchesT 0 0 1 0 0 0 0
MatchesG 0 0 0 1 0 0 0
Algorithm
 Start with ∆H values
 Compute ∆V values
 Then compute the new ∆H values
Algorithm: Example
- A C T G C A A
C -20 -13 -6 -4 -2 -2 -7 -12
A -25 -18 -11 -9 -7 -5 0 -5
∆H values
7 7 2 2 0 -5 -5
-5 -5 -5 -5 -3 7 7
∆V values-5
Time Complexity
𝑂 𝑧𝑛
𝑚
𝑤
where
n = |Sequence Y|
m = |Sequence X|
w = length of computer word
𝑧 =
(𝑀 − 2𝐺 + 1)2
− (𝐼 − 2𝐺)2
2
Implementation
 Python script that generates C code based on input
parameters (M; I; G)
 Will eventually have web page for download of code
Experimental Analysis
 Compared BHL with
 Wu-Manber K-differences algorithm
 Unit cost edit distance bit-parallel algorithm
 Longest Common Subsequence bit-parallel algorithm
 Needleman-Wunsch dynamic programming algorithm
 Computed 25 million alignments with each program
 Each DNA sequence was 63 bases long
 All programs compiled using GCC, optimization level
O3
 Computation done on a typical desktop computer
Results: Comparison to NW algorithm
Results: comparison to bit-parallel algorithms
Current and Future Work
 Implementation for sequences longer than one word
 Single Instruction Multiple Data (SIMD) implementation
 BLOSUM and PAM type substitution matrix support
 General Purpose Graphics Processing Unit (GPGPU)
implementation
 New bit-parallel representations for greater speed and
compactness of data
Acknowledgements
My advisor, Dr. Gary Benson
Lab members
Yevgeniy Gelfand Yozen Hernandez
Funded by the National Science Foundation (NSF)
Questions

More Related Content

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

A bit parallel, general integer-scoring

  • 1. GARY BENSON, YOZEN HERNANDEZ, & JOSHUA LOVING BIOINFORMATICS PROGRAM B O S T O N U N I V E R S I T Y J L O V I N G @ B U . E D U A Bit-Parallel, General Integer-Scoring Sequence Alignment Algorithm
  • 2. Introduction: Problem Description Input: • Sequences X and Y • Integer weights M; I; G match; mismatch; indel or gap that define a similarity or distance scoring function S Output: • Calculate the global alignment score for X and Y
  • 3. Introduction - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 Global Alignment – Needleman-Wunsch Alignment Scoring Matrix Sequence X SequenceY
  • 4. Introduction - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 Integer Scores
  • 5. Introduction - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 Penalty from beginning
  • 6. Introduction - A C T G C A A - 0 0 0 0 0 0 0 0 A 0 2 -3 -3 -3 -3 2 2 G 0 -3 -1 -6 -1 -6 -3 -1 T 0 -3 -6 1 -4 -4 -8 -6 C 0 -3 -1 -4 -2 -2 -7 -11 A 0 2 -3 -9 -7 -5 0 -5 A 0 2 -1 -6 -7 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 No initial Penalty
  • 7. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 8. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 9. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 10. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5
  • 11. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30
  • 12. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30
  • 13. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30
  • 14. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30
  • 15. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30
  • 16. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30
  • 17. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30
  • 18. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30
  • 19. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30
  • 20. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 T -15 C -20 A -25 A -30
  • 21. Needleman-Wunsch Alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30
  • 22. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores
  • 23. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30
  • 24. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30
  • 25. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30
  • 26. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30
  • 27. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30
  • 28. Bit-parallel alignment - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 G -10 T -15 C -20 A -25 A -30 Match = 2, Mismatch = -3, Indel = -5 Integer Scores - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2
  • 29. Motivation  Cheaper sequencing of DNA means that larger datasets are being generated  Sequence analysis of such large datasets can be accelerated by faster alignment algorithms Wetterstrand KA. DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) Available at: www.genome.gov/sequencingcosts. Accessed June 10, 2013.
  • 30. Earlier Bit-parallel Pattern Matching Algorithms  Longest Common Subsequence (LCS) (Allison & Dix, 1986; Crochemore et al, 2001; Hyyro, 2004)  Unit-cost edit-distance (Myers, 99; Hyyro et al, 2005)  K-differences (agrep; Wu-Manber, 92)  Regular expression search (Navarro, 04)  Arbitrary weights edit-distance (Bergeron&Hamel, 02)
  • 31. - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5
  • 32. Algorithm Foundation - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match = 2, Mismatch = -3, Indel = -5 -3 -8 -1 -6
  • 33. - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 -3 -8 -1 -6 -5 2 2 -5 Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5
  • 34. - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 -3 -8 -1 -6 ∆H ∆V ∆VNEXT ∆HNEXT -5 2 2 -5 Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5
  • 35. ∆H -5 ∆V - -5 - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 -3 -8 -1 -6 -5 2 2 -5 ∆H ∆V ∆VNEXT ∆HNEXT Algorithm Foundation Match = 2, Mismatch = -3, Indel = -5 Output Input
  • 36. Function Table ∆VNEXT output values given ∆V and ∆H input values
  • 37. What is the range of differences?
  • 38. What is the range of differences? Match = 2, Mismatch = -3, Indel = -5
  • 39. What is the range of differences? Match = 2, Mismatch = -3, Indel = -5 Minimum Value = Indel = -5 Maximum Value = Match – Indel = 2 - (- 5) = 7
  • 40. Generalized Function Table M = match score I = mismatch score G = indel (gap) penalty
  • 41. Zones in Example Function Table
  • 42. Bit-parallel Representation  Bit-vectors: computer words 64 bits long  A bit-vector for each possible difference, both horizontally and vertically (∆V and ∆H)  A set of Match vectors (MatchA, MatchC, MatchG, MatchT in the DNA case)  We keep track of match positions because they are a special case in the function table.
  • 43. Example ∆H Bit-vector Storage - A C T G C A A C -20 -13 -6 -4 -2 -2 -7 -12 ∆H values 7 7 2 2 0 -5 -5 ∆H Bit-Vectors 7 1 1 0 0 0 0 0 6 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 2 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 -1 0 0 0 0 0 0 0 -2 0 0 0 0 0 0 0 -3 0 0 0 0 0 0 0 -4 0 0 0 0 0 0 0 -5 0 0 0 0 0 1 1
  • 44. Example Match Vectors - A C T G C A A - 0 -5 -10 -15 -20 -25 -30 -35 A -5 2 -3 -8 -13 -18 -23 -28 G -10 -3 -1 -6 -6 -11 -16 -21 T -15 -8 -6 1 -4 -9 -14 -19 C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 A -30 -23 -16 -14 -12 -10 -3 2 Match Vectors MatchesA 1 0 0 0 0 1 1 MatchesC 0 1 0 0 1 0 0 MatchesT 0 0 1 0 0 0 0 MatchesG 0 0 0 1 0 0 0
  • 45. Algorithm  Start with ∆H values  Compute ∆V values  Then compute the new ∆H values
  • 46. Algorithm: Example - A C T G C A A C -20 -13 -6 -4 -2 -2 -7 -12 A -25 -18 -11 -9 -7 -5 0 -5 ∆H values 7 7 2 2 0 -5 -5 -5 -5 -5 -5 -3 7 7 ∆V values-5
  • 47. Time Complexity 𝑂 𝑧𝑛 𝑚 𝑤 where n = |Sequence Y| m = |Sequence X| w = length of computer word 𝑧 = (𝑀 − 2𝐺 + 1)2 − (𝐼 − 2𝐺)2 2
  • 48. Implementation  Python script that generates C code based on input parameters (M; I; G)  Will eventually have web page for download of code
  • 49. Experimental Analysis  Compared BHL with  Wu-Manber K-differences algorithm  Unit cost edit distance bit-parallel algorithm  Longest Common Subsequence bit-parallel algorithm  Needleman-Wunsch dynamic programming algorithm  Computed 25 million alignments with each program  Each DNA sequence was 63 bases long  All programs compiled using GCC, optimization level O3  Computation done on a typical desktop computer
  • 50. Results: Comparison to NW algorithm
  • 51. Results: comparison to bit-parallel algorithms
  • 52. Current and Future Work  Implementation for sequences longer than one word  Single Instruction Multiple Data (SIMD) implementation  BLOSUM and PAM type substitution matrix support  General Purpose Graphics Processing Unit (GPGPU) implementation  New bit-parallel representations for greater speed and compactness of data
  • 53. Acknowledgements My advisor, Dr. Gary Benson Lab members Yevgeniy Gelfand Yozen Hernandez Funded by the National Science Foundation (NSF)

Editor's Notes

  1. In computational biology, sequence alignment forms an important part of biological sequence analysis.To formally state the problem we are trying to solve, given two input sequences we want to find their global alignment score, using integer parameters to weight the alignment. We aren’t computing the alignment, which takes extra steps, we are solely interested in computing the alignment score.
  2. The Needleman Wunsch algorithm is the standard method of global alignment.Here is the alignment scoring matrix for an example alignment of two sequences with a particular set of alignment scoring weights.
  3. We restrict the scores that we are using to be only integer valued.
  4. In this alignment example, we want to align the strings from their beginnings, so we impose a gap penalty for starting from a place other than the beginning in either string.
  5. However, if this constraint is not present, we can allow alignment to start at any point by initializing the first row and column to be all zero.Our method allows either case, but we will focus on the case using an initial penalty.
  6. As a brief overview, the NW algorithm proceeds by initializing the scoring matrix’s first row and column.
  7. To fill in the scoring matrix, we iterate over it, with each cell’s score being determined by the three cells to the left, diagonally, and above.
  8. To fill in the entire scoring matrix, each cell must be visited once.
  9. Because cells depend on the cells that preceded them, they must be done in order.
  10. Bit parallel alignment allows us to represent multiple cells by bits in a computer word.Rather than computing each cell individually, we compute values for entire rows at once.The benefit to bit-parallel algorithms lie in their speed and efficiency.
  11. In this figure, we have cost, in dollars, shown logarithmically on the y-axis against time on the x-axis. The white line represents how Moore’s Law shrinks the cost of computation over time. In green, we have the actual cost of sequencing a human genome, so it is clear that the cost of sequencing is dropping much faster than the cost of computing.This makes implementing faster sequence alignment algorithms important, which is what we are doing.
  12. Say that Hyyroacheieved 4 bit operations per word for LCS, and 15 bit operations per word for Unit-cost edit distance.Arbitrary weights edit-distance – does it solve the same problem? They gave no implementation and were only able to guess at the time complexity. We have an implementation that uses a different methodology.
  13. As we recalled, each cell in the NW alignment matrix is derived from the 3 adjacent cells to its left and above.
  14. As an example, we will consider a small block of the scoring matrix.
  15. We know that the scores in the scoring matrix have unbounded values.However, the differences between adjacent cells are bounded. In our bit-parallel algorithm and others these differences are used as the problem representation. This is similar to how the 4 russians’ technique uses differences between cells. (possibly refer to 4 russian’s technique if questions are asked)
  16. We will call these differences Delta H and Delta V.
  17. Of course, it is obvious that DeltaVnext and DeltaHnext can be derived from Delta H and Delta V, but what was not obvious was the regular pattern that emerges if you look at all possible inputs and outputs.
  18. In fact, the output values produced by input values of Delta H and Delta V follow a very regular pattern.This is shown here in this function table for Delta Vnext output values.
  19. As an example, we can look at the function table shown before. The input Delta V and Delta H values range from -5 to positive 7.
  20. These values derive from the input parameters of 2, -3, and -5
  21. The minimum value is equal to the indel penalty, and the maximum value is equal to the match score minus the indel penalty. 2 minus (negative 5)
  22. Similar relationships determine important boundaries between these zones in this function table.The algorithm needs to compute the values in these zones separately Thus, using a set of scoring parameters we can generate the corresponding function table
  23. Here are the same zones shown in the example function table we were looking at.
  24. What are bit-vectors? They are computer words, in current machines, generally 64 bits long. We use a bit-vector to represent each possible difference horizontally and vertically. We end up with two sets, one that represents Delta H, and another set representing the same integer values of difference for Delta V.We also need a set of Match vectors. For each character in Sequence Y, we wish to know whether there are matches in sequence X. In the case of DNA, this means that we must store 4 vectors for matches with A, C, G or T.These match vectors are necessary because matches represent a special case in the function table.
  25. Here we have a single row in the scoring matrix. The bit-vectors that represent the horizontal changes are shown directly below the values they represent.
  26. Similarly, this is an example set of Match vectors used to store the matches corresponding to each base in Sequence X
  27. Point out the squared values, note that these tend to be small, and mention that we will be showing an actual runtime demonstration of how the parameters affect efficiency.
  28. As the scoring parameters change, the code changes as well. Since it would be very difficult to write a program for each possible input parameter set, I wrote a script that generates the C code for each scoresWe will host it on a website.
  29. For all experiments, we used human DNA and did a total of 25 million alignments. All sequences were 63 bases long, to allow the bit-parallel representations to fit in single computer words. (if they ask: We are working on the multiple word implementation, but were unable to finish in time for this paper.)Single core of 3 GHz computer (intel core 2 duo).
  30. We implemented our algorithm and compared it to several other pattern matching algorithms.This figure compares our method with Needleman Wunsch. Say what the Y – axis is: minutes to compute 25 million alignmentsSay what the X-axis is: number of bit operations that each of our versions of the algorithm uses.Say what the labels on the BHL algorithms mean.Note that as the parameters increase, the time increases.Note that NW takes over 7 minutes, our 2-3-5 version takes under 2
  31. Here, we compare the unit-cost edit distance bit-parallel algorithm, the LCS bit-parallel algorithm, Wu-Manbers K-difference algorithm, and the version of our algorithm equivalent to unit-cost edit-distance.While our algorithm has slightly more operations, 23, than the unit-cost edit distance algorithm with 15, our algorithm is not optimized for any particular parameter set. Even though our algorithm is general purpose, we come quite close to the best known solution.
  32. I’d like to thank my advisor, Dr. Gary Benson, and the other members of my lab, YevgeniyGelfand and Yozen Hernandez, as well as the NSF for funding this work.