SlideShare una empresa de Scribd logo
1 de 123
 
FBW 10-11-2011 Wim Van Criekinge
Inhoud Lessen: Bioinformatica ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Needleman-Wunsch-edu.pl The Score Matrix ----------------   Seq1(j) 1  2  3  4 5 6 7 8 9 10 Seq2 * C K H V F C R V C I (i) * 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 1 C -1 1 0 -1 -2 -3 -4 -5 -6 -7 -8 2 K -2 0 2 1 0 -1 -2 -3 -4 -5 -6 3 K -3 -1 1 1 0 -1 -2 -3 -4 -5 -6 4 C -4 -2 0 0 0 -1 0 -1 -2 -3 -4 5 F -5 -3 -1 -1 -1 1 0 -1 -2 -3 -4 6 C -6 -4 -2 -2 -2 0 2 1 0 -1 -2 7 K -7 -5 -3 -3 -3 -1 1 1 0 -1 -2 8 C -8 -6 -4 -4 -4 -2 0 0 0 1 0 9 V -9 -7 -5 -5 -3 -3 -1 -1 1 0 0 A: matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH  if (substr(seq1, j-1 ,1) eq substr(seq2, i-1 ,1) B: up_score  = matrix(i-1,j) + GAP C: left_score = matrix(i,j-1) + GAP a b c
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FASTA (Pearson 1995) Uses heuristics to avoid calculating the full dynamic programming matrix Speed up searches by an order of magnitude compared to full Smith-Waterman The statistical side of FASTA is still stronger than BLAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
FASTA-Stages ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
 
[object Object],[object Object],[object Object],[object Object]
FastA (http://www.ebi.ac.uk/fasta33/) Blosum50 default. Lower PAM  higher blosum  to detect close sequences Higher PAM and lower blosum to detect distant sequences Gap opening penalty -12, -16 by default for fasta with proteins and DNA, respectively Gap extension penalty -2, -4 by default for fasta with proteins and DNA, respectively  The larger the word-length the less sensitive, but faster the search will be Max number of scores and alignments is 100
FastA Output Database code hyperlinked to the SRS database at EBI Accession number Description Length Initn, init1, opt, z-score calculated during run E score - expectation value, how many hits are expected to be found by chance with such a score while comparing this query to this database.  E() does not represent the % similarity
[object Object],FastA is a family of programs Query:   DNA Protein Database:DNA Protein
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAST -  B asic  L ocal  A lignment  S earch  T ool
What does BLAST do? ,[object Object],[object Object],[object Object],[object Object],[object Object]
The big red button Do My Job It is  dangerous  to hide too much of the underlying complexity from the scientists.
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Neighborhood.pl ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],PAM200 RGD 13 RGD 18 RGE 17 RGN 16 KGD 15 RGQ 15 KGE 14 HGD 13 KGN 13 RAD 13 RGA 13 RGG 13 RGH 13 RGK 13 RGS 13 RGT 13 RSD 13 WGD 13
 
S Length of extension Score Trim to max indexed * *Two non-overlapping HSP’s on a diagonal within distance A
S Length of extension Score Trim to max indexed * *Two non-overlapping HSP’s on a diagonal within distance A
The BLAST algorithm ,[object Object],[object Object],[object Object],MCGPFILGTYC MCG CGP MCG CGP MCT MGP … MCN CTP …  … This list can be computed in linear time MCG, CGP, GPF, PFI, FIL, ILG, LGT, GTY, TYC
The Blast Algorithm (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
BLAST parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Critical parameters: T,W and scoring matrix ,[object Object],[object Object],[object Object],[object Object]
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Database Searching ,[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
Mathematical Basis of BLAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Mathematical Basis of BLAST ,[object Object],[object Object],[object Object],AATCAT ATTCAG HTHHHT
Mathematical Basis of BLAST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Analytical derivation ,[object Object],[object Object],[object Object],[object Object],[object Object]
Karlin-Alschul Statistics ,[object Object],[object Object],[object Object]
Analytical derivation ,[object Object],[object Object],[object Object],[object Object],[object Object],R=log 1/p (mn) E=kmn - λ S
Scoring alignments ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Assessing significance requires a  distribution ,[object Object],Diameter (m) Frequency
 
 
[object Object],[object Object],[object Object],Normal Distribution does NOT Fit Alignment  Scores  !! ,[object Object]
Comparing distributions   Extreme Value: Gaussian:
P(x  S) = 1-exp(-k  m  n  e -  S ) m, n: sequence lengths.  k,   free parameters. This can be shown analytically for ungapped alignments and has been found empirically to also hold for gapped alignments under commonly used conditions. Alignment of unrelated/random sequences result in scores following an extreme value distribution ,[object Object],E x P = 1 –e -E E=-ln(1-P)
Alignment algorithms will always produce alignments, regardless of whether it is meaningful or not =>  important to have way of selecting significant alignments from large set of database hits. Solution: fit distribution of scores from database search to extreme value distribution; determine p-value of hit from this fitted distribution. Example: scores fitted to extreme value distribution.  99.9% of this distribution is located below score=112 =>  hit with score = 112 has a p-value of 0.1% ,[object Object]
BLAST uses precomputed extreme value distributions  to  calculate E-values from alignment scores For this reason BLAST only allows certain combinations of substitution matrices and gap penalties This also means that the fit is based on a different data set than the one you are working on A word of caution: BLAST tends to overestimate the significance of its matches E-values from BLAST are fine for identifying sure hits One should be careful using BLAST’s E-values to judge if a marginal hit can be trusted (e.g., you may want to use E-values of 10 -4  to 10 -5 ). ,[object Object]
Determining P-values ,[object Object],[object Object],[object Object]
Bit Scores ,[object Object],[object Object],[object Object],[object Object],[object Object]
-74 -73 -72 * -71 ***** -70 ******* -69 ********** -68 *************** -67 ************************* -66 ************************* -65 ************************************ -64 ***************************************** -63 ************************************************************ -61 ************************ -60 ***************************** -59 ******************* -58 ************** -57 ********* -56 ******** -55 ***** -54 **** -53 * -52 * -51 * -50 -49 Needleman-wunsch-Monte-Carlo.pl  (Average around -64 !)
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
< 20  222  0 :* 22  30  0 :* 24  18  1 :* 26  18  15 :* 28  46  159 :* 30  207  963 :* 32  1016  3724 :=  * 34  4596  10099 :====  * 36  9835  20741 :=========  * 38  23408  34278 :====================  * 40  41534  47814 :===================================  * 42  53471  58447 :============================================  * 44  73080  64473 :====================================================*======= 46  70283  65667 :=====================================================*==== 48  64918  62869 :===================================================*== 50  65930  57368 :===============================================*======= 52  47425  50436 :=======================================  * 54  36788  43081 :===============================  * 56  33156  35986 :============================ * 58  26422  29544 :======================  * 60  21578  23932 :================== * 62  19321  19187 :===============* 64  15988  15259 :============*= 66  14293  12060 :=========*== 68  11679  9486 :=======*== 70  10135  7434 :======*== ,[object Object]
72  8957  5809 :====*=== 74  7728  4529 :===*=== 76  6176  3525 :==*=== 78  5363  2740 :==*== 80  4434  2128 :=*== 82  3823  1628 :=*== 84  3231  1289 :=*= 86  2474  998 :*== 88  2197  772 :*= 90  1716  597 :*= 92  1430  462 :*=  :===============*======================== 94  1250  358 :*=  :============*=========================== 96  954  277 :*  :=========*======================= 98  756  214 :*  :=======*=================== 100  678  166 :*  :=====*================== 102  580  128 :*  :====*=============== 104  476  99 :*  :===*============= 106  367  77 :*  :==*========== 108  309  59 :*  :==*======== 110  287  46 :*  :=*======== 112  206  36 :*  :=*====== 114  161  28 :*  :*===== 116  144  21 :*  :*==== 118  127  16 :*  :*==== >120  886  13 :*  :*============================== Related ,[object Object]
[object Object],[object Object],[object Object],[object Object]
Statistics summary ,[object Object],[object Object],[object Object]
P-values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
E ,[object Object],[object Object],[object Object],[object Object]
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
[object Object]
[object Object]
[object Object]
[object Object]
[object Object]
[object Object]
 
 
 
 
 
 
 
 
[object Object],[object Object],[object Object],[object Object]
Tips:  Low-complexity and Gapped Blast Algorithm ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object]
[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],Tips
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pattern ,[object Object],[object Object]
PSSM  (Position Specific Scoring Matrice)
PSSM  (Position Specific Scoring Matrice)
PSSM  (Position Specific Scoring Matrice)
[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
From: http://bioweb.pasteur.fr/seqanal/blast/intro-uk.html PSI-BLAST PSSM PSSM
PSI-BLAST
PSI-BLAST
PSI-BLAST
PSI-BLAST
[object Object],[object Object],[object Object],[object Object],[object Object],PSI-BLAST pitfalls
[object Object],[object Object],[object Object],[object Object]
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PHI-Blast Local Blast   (Pattern-Hit Initiated BLAST)
PHI-Blast Local Blast  From: http://bioweb.pasteur.fr/seqanal/blast/intro-uk.html
PHI-Blast Local Blast
PHI-Blast Local Blast
PHI-Blast Local Blast
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Installing Blast Locally ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
DataBase Searching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Main database: BLAT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
BLAT Human Genome Browser
BLAT method ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
 
Weblems ,[object Object],[object Object],[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

La actualidad más candente (9)

2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Introduction to R for Data Science :: Session 6 [Linear Regression in R]
Introduction to R for Data Science :: Session 6 [Linear Regression in R] Introduction to R for Data Science :: Session 6 [Linear Regression in R]
Introduction to R for Data Science :: Session 6 [Linear Regression in R]
 
BDACA - Tutorial5
BDACA - Tutorial5BDACA - Tutorial5
BDACA - Tutorial5
 
BDACA - Lecture5
BDACA - Lecture5BDACA - Lecture5
BDACA - Lecture5
 
Data structure Unit-I Part-C
Data structure Unit-I Part-CData structure Unit-I Part-C
Data structure Unit-I Part-C
 
Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
 
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
Post-editese: an Exacerbated Translationese (presentation at MT Summit 2019)
 

Destacado

Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Prof. Wim Van Criekinge
 
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vwebProf. Wim Van Criekinge
 
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitterProf. Wim Van Criekinge
 
2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciences2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciencesProf. Wim Van Criekinge
 
2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_present2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_presentProf. Wim Van Criekinge
 
Bioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introductionBioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introductionProf. Wim Van Criekinge
 
Van criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlilleVan criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlilleProf. Wim Van Criekinge
 

Destacado (20)

Bioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databasesBioinformatica 06-10-2011-t2-databases
Bioinformatica 06-10-2011-t2-databases
 
Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014Bioinformatics t7-proteinstructure v2014
Bioinformatics t7-proteinstructure v2014
 
December 2012 drylab
December 2012 drylabDecember 2012 drylab
December 2012 drylab
 
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
2014 03 28_next_generation_epigenetic_profling_v_les_epigenetica_vweb
 
Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014Bioinformatics t8-go-hmm v2014
Bioinformatics t8-go-hmm v2014
 
Thesis2014
Thesis2014Thesis2014
Thesis2014
 
Thesis bio bix_2014
Thesis bio bix_2014Thesis bio bix_2014
Thesis bio bix_2014
 
2012 12 12_adam_v_final
2012 12 12_adam_v_final2012 12 12_adam_v_final
2012 12 12_adam_v_final
 
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
2011 10 26_quantitative_cell_biology_molecular_profiling_v_twitter
 
Bioinformatica p1-perl-introduction
Bioinformatica p1-perl-introductionBioinformatica p1-perl-introduction
Bioinformatica p1-perl-introduction
 
2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciences2012 12 02_epigenetic_profiling_environmental_health_sciences
2012 12 02_epigenetic_profiling_environmental_health_sciences
 
2015 bioinformatics bio_python_part3
2015 bioinformatics bio_python_part32015 bioinformatics bio_python_part3
2015 bioinformatics bio_python_part3
 
Bioinformatica t8-go-hmm
Bioinformatica t8-go-hmmBioinformatica t8-go-hmm
Bioinformatica t8-go-hmm
 
Bioinformatica t3-scoring matrices
Bioinformatica t3-scoring matricesBioinformatica t3-scoring matrices
Bioinformatica t3-scoring matrices
 
2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_present2015 09 imec_wim_vancriekinge_v42_to_present
2015 09 imec_wim_vancriekinge_v42_to_present
 
2015 bioinformatics bio_python_part4
2015 bioinformatics bio_python_part42015 bioinformatics bio_python_part4
2015 bioinformatics bio_python_part4
 
Bioinformatica t7-protein structure
Bioinformatica t7-protein structureBioinformatica t7-protein structure
Bioinformatica t7-protein structure
 
Bioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introductionBioinformatica 29-09-2011-p1-introduction
Bioinformatica 29-09-2011-p1-introduction
 
Van criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlilleVan criekinge next_generation_epigenetic_profling_vlille
Van criekinge next_generation_epigenetic_profling_vlille
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 

Similar a Bioinformatica 10-11-2011-t5-database searching

Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeProf. Wim Van Criekinge
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticezahid6
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekingeProf. Wim Van Criekinge
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniquesruchibioinfo
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.pptSilpa87
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformaticsatmapandey
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignmentbarathvaj
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 

Similar a Bioinformatica 10-11-2011-t5-database searching (20)

Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
Mayank
MayankMayank
Mayank
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
Bioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmmBioinformatica 08-12-2011-t8-go-hmm
Bioinformatica 08-12-2011-t8-go-hmm
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 

Más de Prof. Wim Van Criekinge

2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_uploadProf. Wim Van Criekinge
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 

Más de Prof. Wim Van Criekinge (20)

2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload2019 03 05_biological_databases_part5_v_upload
2019 03 05_biological_databases_part5_v_upload
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload2019 02 21_biological_databases_part2_v_upload
2019 02 21_biological_databases_part2_v_upload
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
P7 2018 biopython3
P7 2018 biopython3P7 2018 biopython3
P7 2018 biopython3
 
P6 2018 biopython2b
P6 2018 biopython2bP6 2018 biopython2b
P6 2018 biopython2b
 
P4 2018 io_functions
P4 2018 io_functionsP4 2018 io_functions
P4 2018 io_functions
 
P3 2018 python_regexes
P3 2018 python_regexesP3 2018 python_regexes
P3 2018 python_regexes
 
T1 2018 bioinformatics
T1 2018 bioinformaticsT1 2018 bioinformatics
T1 2018 bioinformatics
 
P1 2018 python
P1 2018 pythonP1 2018 python
P1 2018 python
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload2018 03 27_biological_databases_part4_v_upload
2018 03 27_biological_databases_part4_v_upload
 
2018 03 20_biological_databases_part3
2018 03 20_biological_databases_part32018 03 20_biological_databases_part3
2018 03 20_biological_databases_part3
 
2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload2018 02 20_biological_databases_part2_v_upload
2018 02 20_biological_databases_part2_v_upload
 
2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload2018 02 20_biological_databases_part1_v_upload
2018 02 20_biological_databases_part1_v_upload
 
P7 2017 biopython3
P7 2017 biopython3P7 2017 biopython3
P7 2017 biopython3
 
P6 2017 biopython2
P6 2017 biopython2P6 2017 biopython2
P6 2017 biopython2
 

Último

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 

Último (20)

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 

Bioinformatica 10-11-2011-t5-database searching

  • 1.  
  • 2. FBW 10-11-2011 Wim Van Criekinge
  • 3.
  • 4.
  • 5. Needleman-Wunsch-edu.pl The Score Matrix ---------------- Seq1(j) 1 2 3 4 5 6 7 8 9 10 Seq2 * C K H V F C R V C I (i) * 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 1 C -1 1 0 -1 -2 -3 -4 -5 -6 -7 -8 2 K -2 0 2 1 0 -1 -2 -3 -4 -5 -6 3 K -3 -1 1 1 0 -1 -2 -3 -4 -5 -6 4 C -4 -2 0 0 0 -1 0 -1 -2 -3 -4 5 F -5 -3 -1 -1 -1 1 0 -1 -2 -3 -4 6 C -6 -4 -2 -2 -2 0 2 1 0 -1 -2 7 K -7 -5 -3 -3 -3 -1 1 1 0 -1 -2 8 C -8 -6 -4 -4 -4 -2 0 0 0 1 0 9 V -9 -7 -5 -5 -3 -3 -1 -1 1 0 0 A: matrix(i,j) = matrix(i-1,j-1) + (MIS)MATCH if (substr(seq1, j-1 ,1) eq substr(seq2, i-1 ,1) B: up_score = matrix(i-1,j) + GAP C: left_score = matrix(i,j-1) + GAP a b c
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.  
  • 12.  
  • 13.
  • 14. FastA (http://www.ebi.ac.uk/fasta33/) Blosum50 default. Lower PAM higher blosum to detect close sequences Higher PAM and lower blosum to detect distant sequences Gap opening penalty -12, -16 by default for fasta with proteins and DNA, respectively Gap extension penalty -2, -4 by default for fasta with proteins and DNA, respectively The larger the word-length the less sensitive, but faster the search will be Max number of scores and alignments is 100
  • 15. FastA Output Database code hyperlinked to the SRS database at EBI Accession number Description Length Initn, init1, opt, z-score calculated during run E score - expectation value, how many hits are expected to be found by chance with such a score while comparing this query to this database. E() does not represent the % similarity
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. BLAST - B asic L ocal A lignment S earch T ool
  • 21.
  • 22. The big red button Do My Job It is dangerous to hide too much of the underlying complexity from the scientists.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.  
  • 29. S Length of extension Score Trim to max indexed * *Two non-overlapping HSP’s on a diagonal within distance A
  • 30. S Length of extension Score Trim to max indexed * *Two non-overlapping HSP’s on a diagonal within distance A
  • 31.
  • 32.
  • 33.  
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.  
  • 51.  
  • 52.
  • 53. Comparing distributions   Extreme Value: Gaussian:
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59. -74 -73 -72 * -71 ***** -70 ******* -69 ********** -68 *************** -67 ************************* -66 ************************* -65 ************************************ -64 ***************************************** -63 ************************************************************ -61 ************************ -60 ***************************** -59 ******************* -58 ************** -57 ********* -56 ******** -55 ***** -54 **** -53 * -52 * -51 * -50 -49 Needleman-wunsch-Monte-Carlo.pl (Average around -64 !)
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.  
  • 77.  
  • 78.  
  • 79.  
  • 80.  
  • 81.  
  • 82.  
  • 83.  
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98. PSSM (Position Specific Scoring Matrice)
  • 99. PSSM (Position Specific Scoring Matrice)
  • 100. PSSM (Position Specific Scoring Matrice)
  • 101.
  • 102.
  • 108.
  • 109.
  • 110.
  • 111. PHI-Blast Local Blast (Pattern-Hit Initiated BLAST)
  • 112. PHI-Blast Local Blast From: http://bioweb.pasteur.fr/seqanal/blast/intro-uk.html
  • 116.
  • 117.
  • 118.
  • 119.
  • 120. BLAT Human Genome Browser
  • 121.
  • 122.  
  • 123.