SlideShare una empresa de Scribd logo
1 de 59
Algorithms for Alignment of
Genomic Sequences
Michael Brudno
Department of Computer Science
Stanford University
PGA Workshop 07/16/2004
Conservation Implies Function
Exon
Gene
CNS:
Other
Conserved
Edit Distance Model (1)
Weighted sum of insertions, deletions &
mutations to transform one string into
another
AGGCACA--CA AGGCACACA
| |||| || or | || ||
A--CACATTCA ACACATTCA
Edit Distance Model (2)
Given: x, y
Define: F(i,j) = Score of best alignment of
x1…xi to y1…yj
Recurrence: F(i,j) = max (F(i-1,j) – GAP_PENALTY,
F(i,j-1) – GAP_PENALTY,
F(i-1,j-1) + SCORE(xi, yj))
Edit Distance Model (3)
F(i,j) = Score of best
alignment ending at i,j
Time O( n2 ) for two seqs,
O( nk ) for k seqs
F(i,j)
F(i,j-1)
F(i-1,j-1)
F(i-1,j)
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
Overview
• Local Alignment (CHAOS)
• Multiple Global Alignment (LAGAN)
- Whole Genome Alignment
• Glocal Alignment (Shuffle-LAGAN)
• Biological Story
Local Alignment
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC
F(i,j) = max (F(i,j), 0)
Return all paths with
a position i,j where
F(i,j) > C
Time O( n2 ) for two seqs,
O( nk ) for k seqs
Heuristic Local Alignment
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC
BLAST FASTA
CHAOS: CHAins Of Seeds
1. Find short matching words
(seeds)
2. Chain them
3. Rescore chain
CHAOS: Chaining the Seeds
• Find seeds at current location
in seq1
location
in seq1
seed
seq1
seq2
CHAOS: Chaining the Seeds
location
in seq1
distance
cutoff seed
seq1
seq2
• Find seeds at current location
in seq1
CHAOS: Chaining the Seeds
location
in seq1
distance
cutoff
gap
cutoff
seed
seq1
seq2
• Find seeds at current location
in seq1
CHAOS: Chaining the Seeds
• Find seeds at current location
in seq1
• Find the previous seeds that
fall into the search box
location
in seq1
distance
cutoff
gap
cutoff
seed
Search
box
seq1
seq2
CHAOS: Chaining the Seeds
• Find seeds at current location
in seq1
• Find the previous seeds that
fall into the search box
• Do a range query: seeds are
indexed by their diagonal
location
in seq1
distance
cutoff
gap
cutoff
seed
Search
box
seq1
seq2
Range of
search
CHAOS: Chaining the Seeds
• Find seeds at current location
in seq1
• Find the previous seeds that
fall into the search box
• Do a range query: seeds are
indexed by their diagonal.
• Pick a previous seed that
maximizes the score of chain location
in seq1
distance
cutoff
gap
cutoff
seed
Search
box
seq1
seq2
Range of
search
CHAOS: Chaining the Seeds
• Find seeds at current location
in seq1
• Find the previous seeds that
fall into the search box
• Do a range query: seeds are
indexed by their diagonal.
• Pick a previous seed that
maximizes the score of chain location
in seq1
distance
cutoff
gap
cutoff
seed
Search
box
seq1
seq2
Range of
search
Time O(n log n), where n is number of seeds.
CHAOS Scoring
• Initial score = # matching bp - gaps
• Rapid rescoring: extend all seeds to find
optimal location for gaps
Overview
• Local Alignment (CHAOS)
• Multiple Global Alignment (LAGAN)
- Whole Genome Alignment
• Glocal Alignment (Shuffle-LAGAN)
• Biological Story
Global Alignment
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
x
y
z
LAGAN: 1. FIND Local Alignments
1. Find Local
Alignments
2. Chain Local
Alignments
3. Restricted DP
LAGAN: 2. CHAIN Local Alignments
1. Find Local
Alignments
2. Chain Local
Alignments
3. Restricted DP
LAGAN: 3. Restricted DP
1. Find Local
Alignments
2. Chain Local
Alignments
3. Restricted DP
MLAGAN: 1. Progressive Alignment
Given N sequences, phylogenetic tree
Align pairwise, in order of the tree
(LAGAN)
Human
Baboon
Mouse
Rat
MLAGAN: 2. Multi-anchoring
X
Z
Y
Z
X/Y
Z
To anchor the (X/Y), and (Z) alignments:
Cystic Fibrosis (CFTR), 12 species
• Human sequence length: 1.8 Mb
• Total genomic sequence: 13 Mb
Human
Baboon Cat Dog
Cow Pig
Mouse
Rat
Chimp
Chicken
Fugufish
Zebrafish
CFTR (cont’d )
90
550
99.7%
Mammals
LAGAN
90
862
96%
Chicken & Fishes
Chicken & Fishes
Mammals
670
4547
99.8%
MLAGAN
98%
MAX
MEMORY
(Mb)
TIME (sec)
% Exons
Aligned
Automatic computational system for
comparative analysis of pairs of genomes
http://pipeline.lbl.gov
Alignments (all pair combinations):
Human Genome (Golden Path Assembly)
Mouse assemblies: Arachne, Phusion (2001) MGSC v3 (2002)
Rat assemblies: January 2003, February 2003
----------------------------------------------------------
D. Melanogaster vs D. Pseudoobscura February 2003
Tandem Local/Global Approach
•Finding a likely mapping for a contig (BLAT)
Progressive Alignment Scheme
yes no yes no
Human, Mouse and
Rat genomes
Pairwise M/R
mapping
Aligned M&R fragments Unaligned M&R sequences
Map to Human Genome
Mapping aligned fragments by
union of M&R local BLAT hits
on the human genome
H/M/R MLAGAN
alignment
M/R pairwise
alignment
M/H and R/H
pairwise alignment
Unassigned M&R
DNA fragments
yes no
Computational Time
23 dual 2.2GHz Intel Xeon node PC cluster.
Pair-wise rat/mouse – 4 hours
Pair-wise rat/human and mouse/human – 2 hours
Multiple human/mouse/rat – 9 hours
Total wall time: ~ 15 hours
Distribution of Large Indels
0
20
40
60
80
100
120
140
160
180
200
100 150 200 250 300 350 400 450 500 550
Indel length
Count
Evolution Over a Chromosome
Overview
• Local Alignment (CHAOS)
• Multiple Global Alignment (LAGAN)
- Whole Genome Alignment
• Glocal Alignment (Shuffle-LAGAN)
• Biological Story
Evolution at the DNA level
…ACGGTGCAGTTACCA…
…AC----CAGTCCACCA…
Mutation
SEQUENCE EDITS
REARRANGEMENTS
Deletion
Inversion
Translocation
Duplication
Local & Global Alignment
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC
Local Global
Glocal Alignment Problem
Find least cost transformation of one
sequence into another using new operations
•Sequence edits
•Inversions
•Translocations
•Duplications
•Combinations
of above
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC
Shuffle-LAGAN
A glocal aligner for long DNA
sequences
S-LAGAN: Find Local Alignments
1. Find Local
Alignments
2. Build Rough
Homology Map
3. Globally Align
Consistent Parts
S-LAGAN: Build Homology Map
1. Find Local
Alignments
2. Build Rough
Homology Map
3. Globally Align
Consistent Parts
Building the Homology Map
d
a b
c
Chain (using Eppstein
Galil); each alignment
gets a score which is
MAX over 4 possible
chains.
Penalties are affine
(event and distance
components)
Penalties:
a) regular
b) translocation
c) inversion
d) inverted
translocation
S-LAGAN: Build Homology Map
1. Find Local
Alignments
2. Build Rough
Homology Map
3. Globally Align
Consistent Parts
S-LAGAN: Global Alignment
1. Find Local
Alignments
2. Build Rough
Homology Map
3. Globally Align
Consistent Parts
S-LAGAN Results (CFTR)
L
o
c
a
l
G
l
o
c
a
l
S-LAGAN Results (CFTR)
H
u
m
/
M
u
s
H
u
m
/
R
a
t
S-LAGAN Results (IGF cluster)
S-LAGAN results (HOX)
• 12 paralogous genes
• Conserved order in mammals
S-LAGAN results (HOX)
• 12 paralogous genes
• Conserved order in mammals
S-LAGAN Results (Chr 20)
• Human Chr 20 v. homologous Mouse Chr 2.
• 270 Segments of conserved synteny
• 70 Inversions
S-LAGAN Results (Whole Genome)
LAGAN S-LAGAN
Total 37% 38%
Exon 93% 96%
Ups200 78% 81%
CPU Time 350 Hrs 450 Hrs
• Used Berkeley Genome Pipeline
• % Human genome aligned with mouse sequence
• Evaluation criteria from Waterston, et al
(Nature 2002)
Rearrangements in Human v. Mouse
Preliminary conclusions:
• Rearrangements come in all sizes
• Duplications worse conserved than other
rearranged regions
• Simple inversions tend to be most
common and most conserved
What is next? (Shuffle)
• Better algorithm and scoring
• Whole genome synteny mapping
• Multiple Glocal Alignment(!?)
Overview
• Local Alignment (CHAOS)
• Multiple Global Alignment (LAGAN)
- Whole Genome Alignment
• Glocal Alignment (Shuffle-LAGAN)
• Biological Story
Biological Story
• Math1 (Mouse Atonal Homologue 1,
also ATOH) is a gene that is
responsible for nervous system
development
Align Human, Mouse, Rat & Fugu
Detailed Alignment
hum_a : CAATAGAGGGTCTGGCAGAGGCTC---------------------CTGGC @ 57336/400001
mus_a : CAATAGAGGGGCTGGCAGAGGCTC---------------------CTGGC @ 78565/400001
rat_a : CAATAGAGGGGCTGGCAGAGACTC---------------------CTGGC @ 112663/369938
fug_a : TGATGGGGAGCGTGCATTAATTTCAGGCTATTGTTAACAGGCTCGTGGGC @ 36013/68174
hum_a : CGCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGC @ 57386/400001
mus_a : CCCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGC @ 78615/400001
rat_a : CCCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGC @ 112713/369938
fug_a : CGAGGTGTTGGATGGCCTGAGTGAAGCACGCGCTGTCAGCTGGCGAGCGC @ 36063/68174
Can we align human & fly???
CGCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCTCCTTTCAGGCAGCTCCCCGGGGAG
CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG
CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG
GAGGTGTTGGATGGCCTGAGTGA-AGCACGCGCTGTCAGCTGGCGAGCGCTCGCG-AGTCCCTGCCGTGTCCCCG
Melan GCTACTCCAGCT-ACCACCTGCATGCAGCTGCACAGC
Pseudo GCCACTGAGACT-GCCACCTGCATGCAGCTGCACAGA
Putting it all together
CGCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCTCCTTTCAGGCAGCTCCCCGGGGAG
CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG
CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG
GAGGTGTTGGATGGCCTGAGTGA-AGCACGCGCTGTCAGCTGGCGAGCGCTCGCG-AGTCCCTGCCGTGTCCCCG
Melan GCTACTCCAGCT-ACCACCTGCATGCAGCTGCACAGC
Pseudo GCCACTGAGACT-GCCACCTGCATGCAGCTGCACAGA
Overview
• Local Alignment (CHAOS)
• Multiple Global Alignment (LAGAN)
- Whole Genome Alignment
• Glocal Alignment (Shuffle-LAGAN)
• Biological Story
Acknowledgments
Stanford:
Serafim Batzoglou
Arend Sidow
Matt Scott
Gregory Cooper
Chuong (Tom) Do
Sanket Malde
Kerrin Small
Mukund Sundararajan
Berkeley:
Inna Dubchak
Alexander Poliakov
Göttingen:
Burkhard Morgenstern
Rat Genome Sequencing
Consortium
http://lagan.stanford.edu/

Más contenido relacionado

Similar a AlgoAlignementGenomicSequences.ppt

2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekingeProf. Wim Van Criekinge
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)Jonathan Blakes
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Genome Informatics 2016 poster
Genome Informatics 2016 posterGenome Informatics 2016 poster
Genome Informatics 2016 posterWilliam Chow
 
wheat genome project.pptx
wheat genome project.pptxwheat genome project.pptx
wheat genome project.pptxBhagya246626
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...AIST
 
Protein structure determination and our software tools
Protein structure determinationand our software toolsProtein structure determinationand our software tools
Protein structure determination and our software toolsMark Berjanskii
 
Combining Data in Species Distribution Models
Combining Data in Species Distribution ModelsCombining Data in Species Distribution Models
Combining Data in Species Distribution ModelsBob O'Hara
 
Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014Karen Cranston
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 

Similar a AlgoAlignementGenomicSequences.ppt (20)

Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Topological associated domains- Hi-C
Topological associated domains- Hi-CTopological associated domains- Hi-C
Topological associated domains- Hi-C
 
2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)Genome Exploration in A-T G-C space (mk1)
Genome Exploration in A-T G-C space (mk1)
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015Bioinformatics life sciences_v2015
Bioinformatics life sciences_v2015
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Genome Informatics 2016 poster
Genome Informatics 2016 posterGenome Informatics 2016 poster
Genome Informatics 2016 poster
 
wheat genome project.pptx
wheat genome project.pptxwheat genome project.pptx
wheat genome project.pptx
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
Andrey V. Savchenko - Sequential Hierarchical Image Recognition based on the ...
 
Protein structure determination and our software tools
Protein structure determinationand our software toolsProtein structure determinationand our software tools
Protein structure determination and our software tools
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
1 md2016 homology
1 md2016 homology1 md2016 homology
1 md2016 homology
 
Combining Data in Species Distribution Models
Combining Data in Species Distribution ModelsCombining Data in Species Distribution Models
Combining Data in Species Distribution Models
 
ga-2.ppt
ga-2.pptga-2.ppt
ga-2.ppt
 
Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014Open Tree of Life Phyloseminar 2014
Open Tree of Life Phyloseminar 2014
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 

Último

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 

Último (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

AlgoAlignementGenomicSequences.ppt

  • 1. Algorithms for Alignment of Genomic Sequences Michael Brudno Department of Computer Science Stanford University PGA Workshop 07/16/2004
  • 3. Edit Distance Model (1) Weighted sum of insertions, deletions & mutations to transform one string into another AGGCACA--CA AGGCACACA | |||| || or | || || A--CACATTCA ACACATTCA
  • 4. Edit Distance Model (2) Given: x, y Define: F(i,j) = Score of best alignment of x1…xi to y1…yj Recurrence: F(i,j) = max (F(i-1,j) – GAP_PENALTY, F(i,j-1) – GAP_PENALTY, F(i-1,j-1) + SCORE(xi, yj))
  • 5. Edit Distance Model (3) F(i,j) = Score of best alignment ending at i,j Time O( n2 ) for two seqs, O( nk ) for k seqs F(i,j) F(i,j-1) F(i-1,j-1) F(i-1,j) AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
  • 6. Overview • Local Alignment (CHAOS) • Multiple Global Alignment (LAGAN) - Whole Genome Alignment • Glocal Alignment (Shuffle-LAGAN) • Biological Story
  • 7. Local Alignment AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC F(i,j) = max (F(i,j), 0) Return all paths with a position i,j where F(i,j) > C Time O( n2 ) for two seqs, O( nk ) for k seqs
  • 9. CHAOS: CHAins Of Seeds 1. Find short matching words (seeds) 2. Chain them 3. Rescore chain
  • 10. CHAOS: Chaining the Seeds • Find seeds at current location in seq1 location in seq1 seed seq1 seq2
  • 11. CHAOS: Chaining the Seeds location in seq1 distance cutoff seed seq1 seq2 • Find seeds at current location in seq1
  • 12. CHAOS: Chaining the Seeds location in seq1 distance cutoff gap cutoff seed seq1 seq2 • Find seeds at current location in seq1
  • 13. CHAOS: Chaining the Seeds • Find seeds at current location in seq1 • Find the previous seeds that fall into the search box location in seq1 distance cutoff gap cutoff seed Search box seq1 seq2
  • 14. CHAOS: Chaining the Seeds • Find seeds at current location in seq1 • Find the previous seeds that fall into the search box • Do a range query: seeds are indexed by their diagonal location in seq1 distance cutoff gap cutoff seed Search box seq1 seq2 Range of search
  • 15. CHAOS: Chaining the Seeds • Find seeds at current location in seq1 • Find the previous seeds that fall into the search box • Do a range query: seeds are indexed by their diagonal. • Pick a previous seed that maximizes the score of chain location in seq1 distance cutoff gap cutoff seed Search box seq1 seq2 Range of search
  • 16. CHAOS: Chaining the Seeds • Find seeds at current location in seq1 • Find the previous seeds that fall into the search box • Do a range query: seeds are indexed by their diagonal. • Pick a previous seed that maximizes the score of chain location in seq1 distance cutoff gap cutoff seed Search box seq1 seq2 Range of search Time O(n log n), where n is number of seeds.
  • 17. CHAOS Scoring • Initial score = # matching bp - gaps • Rapid rescoring: extend all seeds to find optimal location for gaps
  • 18. Overview • Local Alignment (CHAOS) • Multiple Global Alignment (LAGAN) - Whole Genome Alignment • Glocal Alignment (Shuffle-LAGAN) • Biological Story
  • 20. LAGAN: 1. FIND Local Alignments 1. Find Local Alignments 2. Chain Local Alignments 3. Restricted DP
  • 21. LAGAN: 2. CHAIN Local Alignments 1. Find Local Alignments 2. Chain Local Alignments 3. Restricted DP
  • 22. LAGAN: 3. Restricted DP 1. Find Local Alignments 2. Chain Local Alignments 3. Restricted DP
  • 23. MLAGAN: 1. Progressive Alignment Given N sequences, phylogenetic tree Align pairwise, in order of the tree (LAGAN) Human Baboon Mouse Rat
  • 24. MLAGAN: 2. Multi-anchoring X Z Y Z X/Y Z To anchor the (X/Y), and (Z) alignments:
  • 25. Cystic Fibrosis (CFTR), 12 species • Human sequence length: 1.8 Mb • Total genomic sequence: 13 Mb Human Baboon Cat Dog Cow Pig Mouse Rat Chimp Chicken Fugufish Zebrafish
  • 26. CFTR (cont’d ) 90 550 99.7% Mammals LAGAN 90 862 96% Chicken & Fishes Chicken & Fishes Mammals 670 4547 99.8% MLAGAN 98% MAX MEMORY (Mb) TIME (sec) % Exons Aligned
  • 27. Automatic computational system for comparative analysis of pairs of genomes http://pipeline.lbl.gov Alignments (all pair combinations): Human Genome (Golden Path Assembly) Mouse assemblies: Arachne, Phusion (2001) MGSC v3 (2002) Rat assemblies: January 2003, February 2003 ---------------------------------------------------------- D. Melanogaster vs D. Pseudoobscura February 2003
  • 28. Tandem Local/Global Approach •Finding a likely mapping for a contig (BLAT)
  • 29. Progressive Alignment Scheme yes no yes no Human, Mouse and Rat genomes Pairwise M/R mapping Aligned M&R fragments Unaligned M&R sequences Map to Human Genome Mapping aligned fragments by union of M&R local BLAT hits on the human genome H/M/R MLAGAN alignment M/R pairwise alignment M/H and R/H pairwise alignment Unassigned M&R DNA fragments yes no
  • 30. Computational Time 23 dual 2.2GHz Intel Xeon node PC cluster. Pair-wise rat/mouse – 4 hours Pair-wise rat/human and mouse/human – 2 hours Multiple human/mouse/rat – 9 hours Total wall time: ~ 15 hours
  • 31. Distribution of Large Indels 0 20 40 60 80 100 120 140 160 180 200 100 150 200 250 300 350 400 450 500 550 Indel length Count
  • 32. Evolution Over a Chromosome
  • 33. Overview • Local Alignment (CHAOS) • Multiple Global Alignment (LAGAN) - Whole Genome Alignment • Glocal Alignment (Shuffle-LAGAN) • Biological Story
  • 34. Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication
  • 35. Local & Global Alignment AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC Local Global
  • 36. Glocal Alignment Problem Find least cost transformation of one sequence into another using new operations •Sequence edits •Inversions •Translocations •Duplications •Combinations of above AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA AGTGACCTGGGAAGACCCTGAACCCTGGGTCACAAAACTC
  • 37. Shuffle-LAGAN A glocal aligner for long DNA sequences
  • 38. S-LAGAN: Find Local Alignments 1. Find Local Alignments 2. Build Rough Homology Map 3. Globally Align Consistent Parts
  • 39. S-LAGAN: Build Homology Map 1. Find Local Alignments 2. Build Rough Homology Map 3. Globally Align Consistent Parts
  • 40. Building the Homology Map d a b c Chain (using Eppstein Galil); each alignment gets a score which is MAX over 4 possible chains. Penalties are affine (event and distance components) Penalties: a) regular b) translocation c) inversion d) inverted translocation
  • 41. S-LAGAN: Build Homology Map 1. Find Local Alignments 2. Build Rough Homology Map 3. Globally Align Consistent Parts
  • 42. S-LAGAN: Global Alignment 1. Find Local Alignments 2. Build Rough Homology Map 3. Globally Align Consistent Parts
  • 46. S-LAGAN results (HOX) • 12 paralogous genes • Conserved order in mammals
  • 47. S-LAGAN results (HOX) • 12 paralogous genes • Conserved order in mammals
  • 48. S-LAGAN Results (Chr 20) • Human Chr 20 v. homologous Mouse Chr 2. • 270 Segments of conserved synteny • 70 Inversions
  • 49. S-LAGAN Results (Whole Genome) LAGAN S-LAGAN Total 37% 38% Exon 93% 96% Ups200 78% 81% CPU Time 350 Hrs 450 Hrs • Used Berkeley Genome Pipeline • % Human genome aligned with mouse sequence • Evaluation criteria from Waterston, et al (Nature 2002)
  • 50. Rearrangements in Human v. Mouse Preliminary conclusions: • Rearrangements come in all sizes • Duplications worse conserved than other rearranged regions • Simple inversions tend to be most common and most conserved
  • 51. What is next? (Shuffle) • Better algorithm and scoring • Whole genome synteny mapping • Multiple Glocal Alignment(!?)
  • 52. Overview • Local Alignment (CHAOS) • Multiple Global Alignment (LAGAN) - Whole Genome Alignment • Glocal Alignment (Shuffle-LAGAN) • Biological Story
  • 53. Biological Story • Math1 (Mouse Atonal Homologue 1, also ATOH) is a gene that is responsible for nervous system development
  • 54. Align Human, Mouse, Rat & Fugu
  • 55. Detailed Alignment hum_a : CAATAGAGGGTCTGGCAGAGGCTC---------------------CTGGC @ 57336/400001 mus_a : CAATAGAGGGGCTGGCAGAGGCTC---------------------CTGGC @ 78565/400001 rat_a : CAATAGAGGGGCTGGCAGAGACTC---------------------CTGGC @ 112663/369938 fug_a : TGATGGGGAGCGTGCATTAATTTCAGGCTATTGTTAACAGGCTCGTGGGC @ 36013/68174 hum_a : CGCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGC @ 57386/400001 mus_a : CCCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGC @ 78615/400001 rat_a : CCCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGC @ 112713/369938 fug_a : CGAGGTGTTGGATGGCCTGAGTGAAGCACGCGCTGTCAGCTGGCGAGCGC @ 36063/68174
  • 56. Can we align human & fly??? CGCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCTCCTTTCAGGCAGCTCCCCGGGGAG CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG GAGGTGTTGGATGGCCTGAGTGA-AGCACGCGCTGTCAGCTGGCGAGCGCTCGCG-AGTCCCTGCCGTGTCCCCG Melan GCTACTCCAGCT-ACCACCTGCATGCAGCTGCACAGC Pseudo GCCACTGAGACT-GCCACCTGCATGCAGCTGCACAGA
  • 57. Putting it all together CGCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCTCCTTTCAGGCAGCTCCCCGGGGAG CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG CCCGGTGC-GGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTGAGCGCACTCG-CTTTCAGGCAGCTCCCCGGGGAG GAGGTGTTGGATGGCCTGAGTGA-AGCACGCGCTGTCAGCTGGCGAGCGCTCGCG-AGTCCCTGCCGTGTCCCCG Melan GCTACTCCAGCT-ACCACCTGCATGCAGCTGCACAGC Pseudo GCCACTGAGACT-GCCACCTGCATGCAGCTGCACAGA
  • 58. Overview • Local Alignment (CHAOS) • Multiple Global Alignment (LAGAN) - Whole Genome Alignment • Glocal Alignment (Shuffle-LAGAN) • Biological Story
  • 59. Acknowledgments Stanford: Serafim Batzoglou Arend Sidow Matt Scott Gregory Cooper Chuong (Tom) Do Sanket Malde Kerrin Small Mukund Sundararajan Berkeley: Inna Dubchak Alexander Poliakov Göttingen: Burkhard Morgenstern Rat Genome Sequencing Consortium http://lagan.stanford.edu/