SlideShare una empresa de Scribd logo
1 de 63
BLAST and FASTA


                  1
Pairwise Alignment

          Global                        Local
• Best score from among       • Best score from among
  alignments of full-length     alignments of partial
  sequences                     sequences
• Needelman-Wunch             • Smith-Waterman
  algorithm                     algorithm




                                                        2
Why do we need local alignments?

 •   To compare a short sequence to a large one.

 •   To compare a single sequence to an entire
     database

 •   To compare a partial sequence to the whole.



                                                   3
Why do we need local alignments?
 • Identify newly determined sequences
 • Compare new genes to known ones
 • Guess functions for entire genomes full of
   ORFs of unknown function




                                                4
Mathematical Basis
for Local Alignment
• Model matches as a sequence of coin
  tosses
• Let p be the probability of “head”
   – For a “fair” coin, p = 0.5
• According to Paul Erdös-Alfréd Rényi
  law:
  If there are n throws, then the expected
  length, R, of the longest run of “heads”
  is
               R = log1/p (n).               Paul Erdös
                                                          5
Mathematical Basis
for Local Alignment

• Example: Suppose n = 20 for a “fair” coin
             R=log2(20)=4.32
• Problem: How does one model DNA (or
  amino acid) alignments as coin tosses.




                                              6
Modeling Sequence Alignments
• To model random sequence alignments, replace a match by
  “head” (H) and mismatch by “tail” (T).

             AATCAT
                              HTHHHT
             ATTCAG

• For ungapped DNA alignments, the probability of a “head”
  is 1/4.

• For ungapped amino acid alignments, the probability of a
  “head” is 1/20.
                                                             7
Modeling Sequence Alignments
• Thus, for any one particular alignment, the Erdös-
  Rényi law can be applied
• What about for all possible alignments?
   – Consider that sequences can being shifted back and
     forth in the dot matrix plot
• The expected length of the longest match is
                   R = log1/p(mn)
  where m and n are the lengths of the two
  sequences.
                                                          8
Modeling Sequence Alignments
• Suppose m = n = 10, and we deal with DNA
  sequences
            R = log4(100) = 3.32
• This analysis assumes that the base
  composition is uniform and the alignment is
  ungapped. The result is approximate, but
  not bad.

                                            9
10
Heuristic Methods: FASTA and BLAST

FASTA
• First fast sequence searching algorithm for
  comparing a query sequence against a database.

BLAST
• Basic Local Alignment Search Technique
  improvement of FASTA: Search speed, ease of
  use, statistical rigor.
                                              11
FASTA and BLAST
• Basic idea: a good alignment contains
  subsequences of absolute identity (short lengths
  of exact matches):

  – First, identify very short exact matches.
  – Next, the best short hits from the first step are
    extended to longer regions of similarity.
  – Finally, the best hits are optimized.


                                                        12
FASTA
Derived from logic of the dot plot
  – compute best diagonals from all frames of
    alignment
The method looks for exact matches between
 words in query and test sequence
  – DNA words are usually 6 nucleotides long
  – protein words are 2 amino acids long



                                                13
FASTA Algorithm




                  14
Makes Longest Diagonal
After all diagonals are found, tries to join
 diagonals by adding gaps

Computes alignments in regions of best
 diagonals


                                           15
FASTA Alignments




                   16
FASTA Results - Histogram
!!SEQUENCE_LIST 1.0
(Nucleotide) FASTA of: b2.seq from: 1 to: 693 December 9, 2002 14:02
TO: /u/browns02/Victor/Search-set/*.seq Sequences:     2,050 Symbols:
913,285 Word Size: 6
 Searching with both strands of the query.
 Scoring matrix: GenRunData:fastadna.cmp
 Constant pamfactor used
 Gap creation penalty: 16 Gap extension penalty: 4

Histogram Key:
 Each histogram symbol represents 4 search set sequences
 Each inset symbol represents 1 search set sequences
 z-scores computed from opt scores
z-score obs    exp
        (=)    (*)
< 20      0      0:
  22      0      0:
  24      3      0:=
  26      2      0:=
  28      5      0:==
  30     11      3:*==
  32     19     11:==*==
  34     38     30:=======*==
  36     58     61:===============*
  38     79    100:====================    *
  40    134    140:==================================*
  42    167    171:==========================================*
  44    205    189:===============================================*====
  46    209    192:===============================================*=====   17
  48    177    184:=============================================*
FASTA Results - List
The best scores are:                   init1 initn      opt     z-sc E(1018780)..

SW:PPI1_HUMAN    Begin: 1 End: 269
! Q00169 homo sapiens (human). phosph... 1854   1854   1854   2249.3   1.8e-117
SW:PPI1_RABIT    Begin: 1 End: 269
! P48738 oryctolagus cuniculus (rabbi... 1840   1840   1840   2232.4   1.6e-116
SW:PPI1_RAT    Begin: 1 End: 270
! P16446 rattus norvegicus (rat). pho... 1543   1543   1837   2228.7   2.5e-116
SW:PPI1_MOUSE    Begin: 1 End: 270
! P53810 mus musculus (mouse). phosph... 1542   1542   1836   2227.5   2.9e-116
SW:PPI2_HUMAN    Begin: 1 End: 270
! P48739 homo sapiens (human). phosph... 1533   1533   1533   1861.0   7.7e-96
SPTREMBL_NEW:BAC25830    Begin: 1 End: 270
! Bac25830 mus musculus (mouse). 10, ... 1488   1488   1522   1847.6   4.2e-95
SP_TREMBL:Q8N5W1    Begin: 1 End: 268
! Q8n5w1 homo sapiens (human). simila... 1477   1477   1522   1847.6   4.3e-95
SW:PPI2_RAT    Begin: 1 End: 269
! P53812 rattus norvegicus (rat). pho... 1482   1482   1516   1840.4   1.1e-94




                                                                                    18
FASTA Results - Alignment
SCORES   Init1: 1515 Initn: 1565 Opt: 1687 z-score: 1158.1 E(): 2.3e-58
>>GB_IN3:DMU09374                                         (2038 nt)
 initn: 1565 init1: 1515 opt: 1687 Z-score: 1158.1 expect(): 2.3e-58
  66.2% identity in 875 nt overlap
 (83-957:151-1022)

                   60        70        80         90      100       110
u39412.gb_pr CCCTTTGTGGCCGCCATGGACAATTCCGGGAAGGAAGCGGAGGCGATGGCGCTGTTGGCC
                                            || ||| | ||||| |    ||| |||||
DMU09374     AGGCGGACATAAATCCTCGACATGGGTGACAACGAACAGAAGGCGCTCCAACTGATGGCC
                    130       140       150        160      170       180

                  120       130       140       150       160       170
u39412.gb_pr GAGGCGGAGCGCAAAGTGAAGAACTCGCAGTCCTTCTTCTCTGGCCTCTTTGGAGGCTCA
             |||||||||   || |||    |   | || ||| |         || || ||||| ||
DMU09374     GAGGCGGAGAAGAAGTTGACCCAGCAGAAGGGCTTTCTGGGATCGCTGTTCGGAGGGTCC
                    190       200       210       220       230       240

                  180       190       200       210       220       230
u39412.gb_pr TCCAAAATAGAGGAAGCATGCGAAATCTACGCCAGAGCAGCAAACATGTTCAAAATGGCC
               ||| | ||||| ||    |||   ||||    | || | |||||||| || ||| ||
DMU09374     AACAAGGTGGAGGACGCCATCGAGTGCTACCAGCGGGCGGGCAACATGTTTAAGATGTCC
                    250       260       270       280       290       300

                  240       250       260       270       280       290
u39412.gb_pr AAAAACTGGAGTGCTGCTGGAAACGCGTTCTGCCAGGCTGCACAGCTGCACCTGCAGCTC
             ||||||||||     ||||| |     |||||| |||| |||   || ||| || |
DMU09374     AAAAACTGGACAAAGGCTGGGGAGTGCTTCTGCGAGGCGGCAACTCTACACGCGCGGGCT   19
                    310       320       330       340       350       360
FASTA on the Web

• Many websites offer
  FASTA searches
• Each server has its limits
• Be aware that you
  depend “on the kindness
  of strangers.”

                               20
Institut de Génétique Humaine, Montpellier France, GeneStream server
         http://www2.igh.cnrs.fr/bin/fasta-guess.cgi
Oak Ridge National Laboratory GenQuest server
         http://avalon.epm.ornl.gov/
European Bioinformatics Institute, Cambridge, UK
         http://www.ebi.ac.uk/htbin/fasta.py?request
EMBL, Heidelberg, Germany
         http://www.embl-heidelberg.de/cgi/fasta-wrapper-free
Munich Information Center for Protein Sequences (MIPS)
at Max-Planck-Institut, Germany
         http://speedy.mips.biochem.mpg.de/mips/programs/fasta.html
Institute of Biology and Chemistry of Proteins Lyon, France
         http://www.ibcp.fr/serv_main.html
Institute Pasteur, France
         http://central.pasteur.fr/seqanal/interfaces/fasta.html
GenQuest at The Johns Hopkins University
         http://www.bis.med.jhmi.edu/Dan/gq/gq.form.html
National Cancer Center of Japan
         http://bioinfo.ncc.go.jp

                                                                       21
FASTA Format
• simple format used by almost all programs
• >header line with a [return] at end
• Sequence (no specific requirements for line
  length, characters, etc)
>URO1 uro1.seq   Length: 2018   November 9, 2000 11:50   Type: N   Check: 3854   ..
CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACA
ACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTT
GCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACC
CACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTG
TGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCA
GGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCAT
CTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGA
TGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG
GAGTCACCAAAACCTGGGACAGGCTCATGCTCCAGGACAATTGCTGTGGCGTAAATGGTC
CATCAGACTGGCAAAAATACACATCTGCCTTCCGGACTGAGAATAATGATGCTGACTATC
CCTGGCCTCGTCAATGCTGTGTTATGAACAATCTTAAAGAACCTCTCAACCTGGAGGCTT                          22
Assessing Alignment Significance
• Generate random alignments and
calculate their scores
• Compute the mean and the standard
deviation (SD) for random scores
• Compute the deviation of the actual score
from the mean of random scores
               Z = (meanX)/SD
• Evaluate the significance of the alignment
• The probability of a Z value is called the E
score
                                            23
E scores or E values
E scores are not equivalent to p
values where
             p < 0.05
are generally considered
statistically significant.
                               24
E values (rules of thumb)
E values below 10-6 are most probably
statistically significant.
E values above 10-6 but below 10-3
deserve a second look.
E values above 10-3 should not be
tossed aside lightly; they should be
thrown out with great force.           25
BLAST
• Basic Local Alignment Search Tool
  – Altschul et al. 1990,1994,1997
• Heuristic method for local alignment
• Designed specifically for database searches
• Based on the same assumption as FASTA
  that good alignments contain short lengths
  of exact matches
                                            26
BLAST
• Both BLAST and FASTA search for local
  sequence similarity - indeed they have exactly
  the same goals, though they use somewhat
  different algorithms and statistical approaches.

• BLAST benefits
  – Speed
  – User friendly
  – Statistical rigor
  – More sensitive
                                                27
Input/Output
• Input:
  – Query sequence Q
  – Database of sequences DB
  – Minimal score S

• Output:
  – Sequences from DB (Seq), such that Q and Seq
    have scores > S

                                               28
BLAST Searches GenBank
[BLAST= Basic Local Alignment Search Tool]
The NCBI BLAST web server lets you compare your
  query sequence to various sections of GenBank:
        –   nr = non-redundant (main sections)
        –   month = new sequences from the past few weeks
        –   refseq_rna
        –   RNA entries from NCBI's Reference Sequence project
        –   refseq_genomic
        –   Genomic entries from NCBI's Reference Sequence project
        –   ESTs
        –   Taxon = e.g., human, Drososphila, yeast, E. coli
        –   proteins (by automatic translation)
        –   pdb = Sequences derived from the 3-dimensional structure
            from Brookhaven Protein Data Bank
                                                                  29
BLAST
• Uses word matching like FASTA
• Similarity matching of words (3 amino acids, 11
  bases)
  – does not require identical words.
• If no words are similar, then no alignment
  – Will not find matches for very short sequences

• Does not handle gaps well
• “gapped BLAST” is somewhat better
                                                     30
BLAST Algorithm




                  31
BLAST Word Matching
MEAAVKEEISVEDEAVDKNI
MEA
 EAA
  AAV        Break query
    AVK
     VKE     into words:
      KEE
       EEI
         EIS
          ISV
          ...         Break database
                        sequences
                        into words:


                                       32
Find locations of matching words
       in database sequences

      ELEPRRPRYRVPDVLVADPPIARLSVSGRDENSVELT MEAT
MEA
EAA     TDVRWMSETGIIDVFLLLGPSISDVFRQYASLTGTQALPPLFSLGYHQSRWNY
AAV        IWLDIEEIHADGKRYFTWDPSRFPQPRTMLERLASKRRV KLVAIVDPH
AVK
KLV
KEE
EEI
EIS
ISV




                                                         33
Extend hits one base at a time




                                 34
Seq_XYZ:      HVTGRSAF_FSYYGYGCYCGLGTGKGLPVDATDRCCWA
Query:           QSVFDYIYYGCYCGWGLG_GK__PRDA

E-val=10-13




  •Use two word matches as anchors to build an alignment
  between the query and a database sequence.

  •Then score the alignment.
                                                     35
HSPs are Aligned Regions
• The results of the word matching and
  attempts to extend the alignment are
  segments
   - called HSPs (High-Scoring Segment
     Pairs)
• BLAST often produces several short HSPs
  rather than a single aligned region

                                            36
•   >gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'.
•             Length = 369
•    Score =    272 bits (137),   Expect = 4e-71
•    Identities = 258/297 (86%), Gaps = 1/297 (0%)
•    Strand = Plus / Plus
•
•   Query: 17    aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76
•                |||||||||||||||| | ||| | ||| || ||| | |||| ||||| |||||||||
•   Sbjct: 1     aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59
•
•   Query: 77    agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136
•                |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| ||
•   Sbjct: 60    agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119
•
•   Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196
•               |||||||| | || | ||||||||||||||| ||||||||||| || ||||||||||||
•   Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179
•
•   Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256
•              ||||||||| | |||||||| |||||||||||||||||| ||||||||||||||||||||
•   Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239
•
•   Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313
•              || || ||||| || ||||||||||| | |||||||||||||||||| ||||||||
•   Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296




                                                                                    37
BLAST variants




                 38
39
40
41
42
43
Understanding BLAST output




                         44
45
46
47
48
49
50
51
52
53
Choosing the right parameters




                            54
55
56
57
Controlling the output




                         58
59
60
61
62
More on BLAST

NCBI Blast Glossary
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html

Education: Blast Information
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

Steve Altschul's Blast Course
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html




                                                             63

Más contenido relacionado

La actualidad más candente (20)

Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Distance based method
Distance based method Distance based method
Distance based method
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
FASTA
FASTAFASTA
FASTA
 
Structural databases
Structural databases Structural databases
Structural databases
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Open Reading Frames
Open Reading FramesOpen Reading Frames
Open Reading Frames
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)Scoring schemes in bioinformatics (blosum)
Scoring schemes in bioinformatics (blosum)
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 

Similar a Blast fasta 4

Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Prof. Wim Van Criekinge
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Prof. Wim Van Criekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekingeProf. Wim Van Criekinge
 
MSc Thesis Presentation
MSc Thesis PresentationMSc Thesis Presentation
MSc Thesis PresentationReem Sherif
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekingeProf. Wim Van Criekinge
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...Hao Jin
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingAntonio Maria Fiscarelli
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...David Rosenblum
 
Representations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningRepresentations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningVijay Raghavan
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Dataaimsnist
 

Similar a Blast fasta 4 (20)

Similarity
SimilaritySimilarity
Similarity
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Arom fold
Arom foldArom fold
Arom fold
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
BLAST
BLASTBLAST
BLAST
 
MSc Thesis Presentation
MSc Thesis PresentationMSc Thesis Presentation
MSc Thesis Presentation
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop scheduling
 
Lecture6.pptx
Lecture6.pptxLecture6.pptx
Lecture6.pptx
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
Representations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningRepresentations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data Mining
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
 

Último

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 

Último (20)

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 

Blast fasta 4

  • 2. Pairwise Alignment Global Local • Best score from among • Best score from among alignments of full-length alignments of partial sequences sequences • Needelman-Wunch • Smith-Waterman algorithm algorithm 2
  • 3. Why do we need local alignments? • To compare a short sequence to a large one. • To compare a single sequence to an entire database • To compare a partial sequence to the whole. 3
  • 4. Why do we need local alignments? • Identify newly determined sequences • Compare new genes to known ones • Guess functions for entire genomes full of ORFs of unknown function 4
  • 5. Mathematical Basis for Local Alignment • Model matches as a sequence of coin tosses • Let p be the probability of “head” – For a “fair” coin, p = 0.5 • According to Paul Erdös-Alfréd Rényi law: If there are n throws, then the expected length, R, of the longest run of “heads” is R = log1/p (n). Paul Erdös 5
  • 6. Mathematical Basis for Local Alignment • Example: Suppose n = 20 for a “fair” coin R=log2(20)=4.32 • Problem: How does one model DNA (or amino acid) alignments as coin tosses. 6
  • 7. Modeling Sequence Alignments • To model random sequence alignments, replace a match by “head” (H) and mismatch by “tail” (T). AATCAT HTHHHT ATTCAG • For ungapped DNA alignments, the probability of a “head” is 1/4. • For ungapped amino acid alignments, the probability of a “head” is 1/20. 7
  • 8. Modeling Sequence Alignments • Thus, for any one particular alignment, the Erdös- Rényi law can be applied • What about for all possible alignments? – Consider that sequences can being shifted back and forth in the dot matrix plot • The expected length of the longest match is R = log1/p(mn) where m and n are the lengths of the two sequences. 8
  • 9. Modeling Sequence Alignments • Suppose m = n = 10, and we deal with DNA sequences R = log4(100) = 3.32 • This analysis assumes that the base composition is uniform and the alignment is ungapped. The result is approximate, but not bad. 9
  • 10. 10
  • 11. Heuristic Methods: FASTA and BLAST FASTA • First fast sequence searching algorithm for comparing a query sequence against a database. BLAST • Basic Local Alignment Search Technique improvement of FASTA: Search speed, ease of use, statistical rigor. 11
  • 12. FASTA and BLAST • Basic idea: a good alignment contains subsequences of absolute identity (short lengths of exact matches): – First, identify very short exact matches. – Next, the best short hits from the first step are extended to longer regions of similarity. – Finally, the best hits are optimized. 12
  • 13. FASTA Derived from logic of the dot plot – compute best diagonals from all frames of alignment The method looks for exact matches between words in query and test sequence – DNA words are usually 6 nucleotides long – protein words are 2 amino acids long 13
  • 15. Makes Longest Diagonal After all diagonals are found, tries to join diagonals by adding gaps Computes alignments in regions of best diagonals 15
  • 17. FASTA Results - Histogram !!SEQUENCE_LIST 1.0 (Nucleotide) FASTA of: b2.seq from: 1 to: 693 December 9, 2002 14:02 TO: /u/browns02/Victor/Search-set/*.seq Sequences: 2,050 Symbols: 913,285 Word Size: 6 Searching with both strands of the query. Scoring matrix: GenRunData:fastadna.cmp Constant pamfactor used Gap creation penalty: 16 Gap extension penalty: 4 Histogram Key: Each histogram symbol represents 4 search set sequences Each inset symbol represents 1 search set sequences z-scores computed from opt scores z-score obs exp (=) (*) < 20 0 0: 22 0 0: 24 3 0:= 26 2 0:= 28 5 0:== 30 11 3:*== 32 19 11:==*== 34 38 30:=======*== 36 58 61:===============* 38 79 100:==================== * 40 134 140:==================================* 42 167 171:==========================================* 44 205 189:===============================================*==== 46 209 192:===============================================*===== 17 48 177 184:=============================================*
  • 18. FASTA Results - List The best scores are: init1 initn opt z-sc E(1018780).. SW:PPI1_HUMAN Begin: 1 End: 269 ! Q00169 homo sapiens (human). phosph... 1854 1854 1854 2249.3 1.8e-117 SW:PPI1_RABIT Begin: 1 End: 269 ! P48738 oryctolagus cuniculus (rabbi... 1840 1840 1840 2232.4 1.6e-116 SW:PPI1_RAT Begin: 1 End: 270 ! P16446 rattus norvegicus (rat). pho... 1543 1543 1837 2228.7 2.5e-116 SW:PPI1_MOUSE Begin: 1 End: 270 ! P53810 mus musculus (mouse). phosph... 1542 1542 1836 2227.5 2.9e-116 SW:PPI2_HUMAN Begin: 1 End: 270 ! P48739 homo sapiens (human). phosph... 1533 1533 1533 1861.0 7.7e-96 SPTREMBL_NEW:BAC25830 Begin: 1 End: 270 ! Bac25830 mus musculus (mouse). 10, ... 1488 1488 1522 1847.6 4.2e-95 SP_TREMBL:Q8N5W1 Begin: 1 End: 268 ! Q8n5w1 homo sapiens (human). simila... 1477 1477 1522 1847.6 4.3e-95 SW:PPI2_RAT Begin: 1 End: 269 ! P53812 rattus norvegicus (rat). pho... 1482 1482 1516 1840.4 1.1e-94 18
  • 19. FASTA Results - Alignment SCORES Init1: 1515 Initn: 1565 Opt: 1687 z-score: 1158.1 E(): 2.3e-58 >>GB_IN3:DMU09374 (2038 nt) initn: 1565 init1: 1515 opt: 1687 Z-score: 1158.1 expect(): 2.3e-58 66.2% identity in 875 nt overlap (83-957:151-1022) 60 70 80 90 100 110 u39412.gb_pr CCCTTTGTGGCCGCCATGGACAATTCCGGGAAGGAAGCGGAGGCGATGGCGCTGTTGGCC || ||| | ||||| | ||| ||||| DMU09374 AGGCGGACATAAATCCTCGACATGGGTGACAACGAACAGAAGGCGCTCCAACTGATGGCC 130 140 150 160 170 180 120 130 140 150 160 170 u39412.gb_pr GAGGCGGAGCGCAAAGTGAAGAACTCGCAGTCCTTCTTCTCTGGCCTCTTTGGAGGCTCA ||||||||| || ||| | | || ||| | || || ||||| || DMU09374 GAGGCGGAGAAGAAGTTGACCCAGCAGAAGGGCTTTCTGGGATCGCTGTTCGGAGGGTCC 190 200 210 220 230 240 180 190 200 210 220 230 u39412.gb_pr TCCAAAATAGAGGAAGCATGCGAAATCTACGCCAGAGCAGCAAACATGTTCAAAATGGCC ||| | ||||| || ||| |||| | || | |||||||| || ||| || DMU09374 AACAAGGTGGAGGACGCCATCGAGTGCTACCAGCGGGCGGGCAACATGTTTAAGATGTCC 250 260 270 280 290 300 240 250 260 270 280 290 u39412.gb_pr AAAAACTGGAGTGCTGCTGGAAACGCGTTCTGCCAGGCTGCACAGCTGCACCTGCAGCTC |||||||||| ||||| | |||||| |||| ||| || ||| || | DMU09374 AAAAACTGGACAAAGGCTGGGGAGTGCTTCTGCGAGGCGGCAACTCTACACGCGCGGGCT 19 310 320 330 340 350 360
  • 20. FASTA on the Web • Many websites offer FASTA searches • Each server has its limits • Be aware that you depend “on the kindness of strangers.” 20
  • 21. Institut de Génétique Humaine, Montpellier France, GeneStream server http://www2.igh.cnrs.fr/bin/fasta-guess.cgi Oak Ridge National Laboratory GenQuest server http://avalon.epm.ornl.gov/ European Bioinformatics Institute, Cambridge, UK http://www.ebi.ac.uk/htbin/fasta.py?request EMBL, Heidelberg, Germany http://www.embl-heidelberg.de/cgi/fasta-wrapper-free Munich Information Center for Protein Sequences (MIPS) at Max-Planck-Institut, Germany http://speedy.mips.biochem.mpg.de/mips/programs/fasta.html Institute of Biology and Chemistry of Proteins Lyon, France http://www.ibcp.fr/serv_main.html Institute Pasteur, France http://central.pasteur.fr/seqanal/interfaces/fasta.html GenQuest at The Johns Hopkins University http://www.bis.med.jhmi.edu/Dan/gq/gq.form.html National Cancer Center of Japan http://bioinfo.ncc.go.jp 21
  • 22. FASTA Format • simple format used by almost all programs • >header line with a [return] at end • Sequence (no specific requirements for line length, characters, etc) >URO1 uro1.seq Length: 2018 November 9, 2000 11:50 Type: N Check: 3854 .. CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACA ACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTT GCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACC CACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTG TGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCA GGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCAT CTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGA TGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG GAGTCACCAAAACCTGGGACAGGCTCATGCTCCAGGACAATTGCTGTGGCGTAAATGGTC CATCAGACTGGCAAAAATACACATCTGCCTTCCGGACTGAGAATAATGATGCTGACTATC CCTGGCCTCGTCAATGCTGTGTTATGAACAATCTTAAAGAACCTCTCAACCTGGAGGCTT 22
  • 23. Assessing Alignment Significance • Generate random alignments and calculate their scores • Compute the mean and the standard deviation (SD) for random scores • Compute the deviation of the actual score from the mean of random scores Z = (meanX)/SD • Evaluate the significance of the alignment • The probability of a Z value is called the E score 23
  • 24. E scores or E values E scores are not equivalent to p values where p < 0.05 are generally considered statistically significant. 24
  • 25. E values (rules of thumb) E values below 10-6 are most probably statistically significant. E values above 10-6 but below 10-3 deserve a second look. E values above 10-3 should not be tossed aside lightly; they should be thrown out with great force. 25
  • 26. BLAST • Basic Local Alignment Search Tool – Altschul et al. 1990,1994,1997 • Heuristic method for local alignment • Designed specifically for database searches • Based on the same assumption as FASTA that good alignments contain short lengths of exact matches 26
  • 27. BLAST • Both BLAST and FASTA search for local sequence similarity - indeed they have exactly the same goals, though they use somewhat different algorithms and statistical approaches. • BLAST benefits – Speed – User friendly – Statistical rigor – More sensitive 27
  • 28. Input/Output • Input: – Query sequence Q – Database of sequences DB – Minimal score S • Output: – Sequences from DB (Seq), such that Q and Seq have scores > S 28
  • 29. BLAST Searches GenBank [BLAST= Basic Local Alignment Search Tool] The NCBI BLAST web server lets you compare your query sequence to various sections of GenBank: – nr = non-redundant (main sections) – month = new sequences from the past few weeks – refseq_rna – RNA entries from NCBI's Reference Sequence project – refseq_genomic – Genomic entries from NCBI's Reference Sequence project – ESTs – Taxon = e.g., human, Drososphila, yeast, E. coli – proteins (by automatic translation) – pdb = Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank 29
  • 30. BLAST • Uses word matching like FASTA • Similarity matching of words (3 amino acids, 11 bases) – does not require identical words. • If no words are similar, then no alignment – Will not find matches for very short sequences • Does not handle gaps well • “gapped BLAST” is somewhat better 30
  • 32. BLAST Word Matching MEAAVKEEISVEDEAVDKNI MEA EAA AAV Break query AVK VKE into words: KEE EEI EIS ISV ... Break database sequences into words: 32
  • 33. Find locations of matching words in database sequences ELEPRRPRYRVPDVLVADPPIARLSVSGRDENSVELT MEAT MEA EAA TDVRWMSETGIIDVFLLLGPSISDVFRQYASLTGTQALPPLFSLGYHQSRWNY AAV IWLDIEEIHADGKRYFTWDPSRFPQPRTMLERLASKRRV KLVAIVDPH AVK KLV KEE EEI EIS ISV 33
  • 34. Extend hits one base at a time 34
  • 35. Seq_XYZ: HVTGRSAF_FSYYGYGCYCGLGTGKGLPVDATDRCCWA Query: QSVFDYIYYGCYCGWGLG_GK__PRDA E-val=10-13 •Use two word matches as anchors to build an alignment between the query and a database sequence. •Then score the alignment. 35
  • 36. HSPs are Aligned Regions • The results of the word matching and attempts to extend the alignment are segments - called HSPs (High-Scoring Segment Pairs) • BLAST often produces several short HSPs rather than a single aligned region 36
  • 37. >gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. • Length = 369 • Score = 272 bits (137), Expect = 4e-71 • Identities = 258/297 (86%), Gaps = 1/297 (0%) • Strand = Plus / Plus • • Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76 • |||||||||||||||| | ||| | ||| || ||| | |||| ||||| ||||||||| • Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59 • • Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136 • |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| || • Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119 • • Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196 • |||||||| | || | ||||||||||||||| ||||||||||| || |||||||||||| • Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179 • • Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256 • ||||||||| | |||||||| |||||||||||||||||| |||||||||||||||||||| • Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239 • • Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313 • || || ||||| || ||||||||||| | |||||||||||||||||| |||||||| • Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296 37
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. Choosing the right parameters 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 59. 59
  • 60. 60
  • 61. 61
  • 62. 62
  • 63. More on BLAST NCBI Blast Glossary http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html Education: Blast Information http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html Steve Altschul's Blast Course http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html 63

Notas del editor

  1. 27
  2. 29