SlideShare una empresa de Scribd logo
1 de 36
Nuttachat Wisittipanit, Phd.
School of Science
Mae Fah Luang University
BLAST
Suppose you have acquired a DNA/Protein sequence derived
from a sample of some environments such as lake, pond or plant.
Introduction
KLMNTRARLIVHISG
LTRK…………………………
……………………
Img Src: http://www.austincc.edu
Sequencing process
Cell Samples
Your sequence
Introduction
• Or you might get a DNA/Protein sequence from a database such as
NCBI/EMBL/Swiss-Prot. You might also find an interesting
gene/sequence from a journal.
KLMNTRARLIVHISG
LTRK…………………………
……………………
Your sequence
• In that case, you might want to know if the sequence that you have,
already exists or is similar to some sequences in a database, may be
down to a particular organism database.
• Why do you want to know that?
• Because you can infer structural, functional and evolutionary
relationship to your query sequence.
Introduction
Already in
here?
Similar?
Your sequence
????????????????????????????
Your Sequence Unknown Sequence
What is this Sequence? Where does it come from?
KLMNTRARLIVHISGLTRK
Introducing BLAST (Basic Local
Alignment Search Tool)
 BLAST tool is used to compare a query sequence with a
library or database of sequences.
 It uses a heuristic search algorithm based on statistical
methods. The algorithm was invented by Stephen
Altschul and his co-workers in 1990.
 BLAST programs were designed for fast database
searching.
BLAST Algorithm
BLAST Algorithm
BLAST Algorithm
BLAST Algorithm (Protein)
L E H K M G S
Query Sequence
Length 11
L E H
E H K
H K M
This generates 11 – 3 + 1 = 9 words
H K M
H K M
Y A N C
Y A N
W = 3
BLAST Algorithm Example
L E H
For each word from a window = 3, generate neighborhood words using
BLOSUM62 matrix with score threshold = 11
L M H
D E H
L E H
C E H
L K H
Q E H
L F H
L E R
. . .
All aligned with
LEH using
BLOSUM62
(then sorted by
scores)
17
13
12
10
9
11
9
9
Score
threshold
(cut off here)
20320 x 20 x 20 alignments
Sorted by scores
3 Amino Acids
BLAST Algorithm Example
L E H
C E H
L K H
Q E H
Word List
DAPCQEHKRGWPNDC
L E H Database sequences
L E H
L E H
L E H
L E H
L K H
L K H
C E H
C E H
QEH
Exact matches of words from the word list to the database sequences
Q E H
D A P C Q E H K R G W P N D C
For each exact word match, alignment is extended in both directions
to find high score segments.
Extended in the right direction Max drop off score X= 2
0
5
10
15
20
25
30
Q-Q E-E H-H K-K M-R G-G S-W
AccumulatedScore
5 5 8
Score drop = 3 > X
Score drop = 1 <= X
Trim to max
Query = Y A N C L E H K M G S
K
5
235 10 18
M
-1
22
G
6
28
S
-3
25
Q E H
D A P C Q E H K R G W P N D C
For each exact word match, alignment is extended in both directions
to find high score segments.
Extended in the left direction
K M G
Max drop off score X= 2
0
5
10
15
20
25
30
35
H-H E-E Q-Q C-C N-P A-A Y-D
AccumulatedScore
5 5 8
Score drop = 3 > X
Score drop = 2 <= X
Query = Y A N C L E H K M G S
18 13 8
C
9
27
N
-2
25
A
4
29
Y
-3
26
BLAST Algorithm Example
A P C Q E H K R G
5 -1 65 5 894 -2
Maximal Segment Pair (MSP)
Pair Score = 4-2+9+5+5+8+5-1+6 = 39
A N C Q E H K M G
BLOSUM62
Scoring Matrix
A P C Q E H K R G
A N C Q E H K M G 39
Maximal Segment Pairs
(MSPs) from other
seeds
Sorted by alignment
scores
42
45
35
37
51
55
33
BLAST Algorithm Example
Each match has its own E-Value
 E-Value: The number of MSPs with similar score or
higher that one can EXPECT to see by chance alone
when searching a database of a particular size.
BLAST Algorithm
Expect Value (E-Value)
 For example: if the E-Value is equal to 10 for a
particular MSP with score S, one can say that
actually…about 10 MSPs with score >= S can just
happen by chance alone (for any query sequence).
 So most likely that our MSP is not a significant match
at all.
BLAST Algorithm
Expect Value (E-Value)
 If E-Value if very small e.x. 10-4 (very high score S), one
can say that it is almost impossible that there would be
any MSP with score >= S.
 Thus, our MSP is a pretty significant match
(homologous).
BLAST Algorithm
Expect Value (E-Value)
 First: Calculate bit score
 S = Score of the alignment (Raw Score)
 , values depend on the scoring scheme and
sequence composition of a database.
[log value is natural logarithm (log base e)]
BLAST Algorithm
E-Value Calculation
 The lower the E-Value, the better.
 E-Value can be used to limit the number of hits in the
result page.
BLAST Algorithm
Expect Value (E-Value)
 Second: Calculate E-Value
 = Bit Score
 m = query length
 n = length of database
BLAST Algorithm
E-Value Calculation
• E-values of 10-4 and lower indicate a significant
homology.
• E-values between 10-4 and 10-2 should be checked
(similar domains, maybe non-homologous).
• E-values between 10-2 and 1 do not indicate a good
homology
BLAST Algorithm
E-Value Interpretation
Gapped BLAST
 The Gapped BLAST algorithm allows gaps to be
introduced into the alignments. That means similar
regions are not broken into several segments.
 This method reflects biological relationships much
better.
 This results in different parameter values when
calculating E-Value ( , ).
BLAST programs
Name Description
Blastp Amino acid query sequence against a protein database
Blastn Nucleotide query sequence against a nucleotide sequence database
Blastx Nucleotide query sequence translated in all reading frames against a
protein database
Tblastn Protein query sequence against a nucleotide sequence database
dynamically translated in all reading frames
Tblastx Six frame translations of a nucleotide query sequence against the
six-frame translations of a nucleotide sequence database.
BLAST programs
Name Common Word Size
Blastp 3
Blastn 11
Blastx 3
Tblastn 3
Tblastx 3
BLAST Suggestion
 Where possible use translated sequence (Protein).
 Split large query sequence (if > 1000 for DNA, >200 for
protein) into small ones.
 If the query has low complexity regions or repeated
segments, remove them and repeat the search.
IVLKVALRPVLRPVLRPVWQARNGS
Repeated segments might confuse the program to find
the ‘real’ significant matches in a database.
Running BLAST
 Find appropriate BLAST program
 Enter query sequence
 Select database to search
 Run BLAST search
 Analyze output
 Interpret E-values
Documenting BLAST
 Program (Blastp, Blastn,..)
 Name of database
 Word size
 E-Value threshold
 Substitution matrix
 Gap penalty
 BLAST results: Sequence Name, Bit Score, Raw
Score, E-Value, Identities, Positives, Gaps
BLAST
 http://blast.ncbi.nlm.nih.gov/
Homework 4A
 Determine the common proteins in Domestic Cat
[Felis catus], Tiger [Panthera tigris] and Snow Leopard
[Uncia uncia] using this initiating sequence
>gi|145558804
MSMVYINMFLAFIMSLMGLLMYRSHLMSSLLCLEGMMLSLFIMMTVAILNNHFTLASMTPII
LLVFAACEAALGLSLLVMVSNTYGTDYVQNLNLLQC
 Report for each protein match: Protein
name, accession number, bit score, raw score, E-
Value, Identities, Positives and Gaps.
Homework 4B
 H5N1 is the subtype of the Influenza A Virus which is a
bird-adapted strain. This subtype can cause “avian
influenza” or “bird flu” which is fatal to human.
 Use DNA sequence with GenBank Accession number
JX120150.1 as a seed sequence to search for other TWO
matching sequences, each belonging to a different
Influenza A virus subtypes (HXNX). [Use Blastn]
 Report for each subtype match: Subtype
name, Organism origin, Sequence name, accession
number, bit score, raw score, E-
Value, Identities, Positives and Gaps
Homework 4C
 Suppose you have acquired an unknown protein
sequence
FLWLWPYLSYIEAVPIRKVQDDTKTLIKTIVTRINDISHTQAVSSKQRVAGLDFIP
GLHPVLSLSRMDQTLAIYQQILTSLHSRNVVQISNDLENLRDLLHLLASSKS
 (1) Use BLAST program to find out which species this sequence most likely
belongs to.
 (2) Report both scientific and common name for the species.
 (3) This sequence matches to a certain protein of that species, Report E-Value,
protein accession number [GenBank], Protein name, Length, Full sequence
and Function.
Homework 4D
 Calculate E-Value for an MSP with
 Raw Score : 83
 Query Length : 103
 Length of database : 48,109,873
 : 0.316
 : 0.135
Send me email with subject
“HW3_BINF_lastname_id” by 28 June before 5pm.
Late submission will NOT be accepted.

Más contenido relacionado

La actualidad más candente

BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins kamalmodi481
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BiotechOnline
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence AlignmentAjayPatil210
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)AnkitTiwari354
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentAfra Fathima
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methodsratanvishwas
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 

La actualidad más candente (20)

BLAST
BLASTBLAST
BLAST
 
Blast
BlastBlast
Blast
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
Msa
MsaMsa
Msa
 
Use of Rasmol and study of proteins
Use of Rasmol and study of proteins Use of Rasmol and study of proteins
Use of Rasmol and study of proteins
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Fasta
FastaFasta
Fasta
 
Blast
BlastBlast
Blast
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
BLAST
BLASTBLAST
BLAST
 
NCBI
NCBINCBI
NCBI
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Scop database
Scop databaseScop database
Scop database
 

Destacado

Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Mark Pallen
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASucheta Tripathy
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Asiri Wijesinghe
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsPragya Pai
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessProf. Dr. Basavaraj Nanjwade
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformaticsKamlesh Patade
 
Application of Bioinformatics in different fields of sciences
Application of Bioinformatics in different fields of sciencesApplication of Bioinformatics in different fields of sciences
Application of Bioinformatics in different fields of sciencesSobia
 
Southern northern and western blotting
Southern northern and western blottingSouthern northern and western blotting
Southern northern and western blottingresearch
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...Torsten Seemann
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...Torsten Seemann
 

Destacado (20)

Blast
BlastBlast
Blast
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012Bio305 genome analysis and annotation 2012
Bio305 genome analysis and annotation 2012
 
Alignments
AlignmentsAlignments
Alignments
 
Fasta
FastaFasta
Fasta
 
Clustal X
Clustal XClustal X
Clustal X
 
Fasta
FastaFasta
Fasta
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Fasta
FastaFasta
Fasta
 
BLAST
BLASTBLAST
BLAST
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
 
Application of Bioinformatics in different fields of sciences
Application of Bioinformatics in different fields of sciencesApplication of Bioinformatics in different fields of sciences
Application of Bioinformatics in different fields of sciences
 
Southern northern and western blotting
Southern northern and western blottingSouthern northern and western blotting
Southern northern and western blotting
 
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...Bioinformatics tools for the diagnostic laboratory -  T.Seemann - Antimicrobi...
Bioinformatics tools for the diagnostic laboratory - T.Seemann - Antimicrobi...
 
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...What can we do with microbial WGS data?  - t.seemann - mc gill summer 2016 - ...
What can we do with microbial WGS data? - t.seemann - mc gill summer 2016 - ...
 
Northern blotting
Northern blottingNorthern blotting
Northern blotting
 

Similar a Blast 2013 1

Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticezahid6
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekingeProf. Wim Van Criekinge
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeProf. Wim Van Criekinge
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)Ariful Islam Sagar
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Prof. Wim Van Criekinge
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfPaul Gardner
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignmentbarathvaj
 
How the blast work
How the blast workHow the blast work
How the blast workAtai Rabby
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekingeProf. Wim Van Criekinge
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.pptSilpa87
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesProf. Wim Van Criekinge
 

Similar a Blast 2013 1 (20)

Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdfppgardner-lecture06-homologysearch.pdf
ppgardner-lecture06-homologysearch.pdf
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
How the blast work
How the blast workHow the blast work
How the blast work
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
Blasta
BlastaBlasta
Blasta
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge
 
Seq alignment
Seq alignment Seq alignment
Seq alignment
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
ISHIposter16_f
ISHIposter16_fISHIposter16_f
ISHIposter16_f
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
Bioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matricesBioinformatica 20-10-2011-t3-scoring matrices
Bioinformatica 20-10-2011-t3-scoring matrices
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Blast 2013 1

  • 1. Nuttachat Wisittipanit, Phd. School of Science Mae Fah Luang University
  • 3. Suppose you have acquired a DNA/Protein sequence derived from a sample of some environments such as lake, pond or plant. Introduction KLMNTRARLIVHISG LTRK………………………… …………………… Img Src: http://www.austincc.edu Sequencing process Cell Samples Your sequence
  • 4. Introduction • Or you might get a DNA/Protein sequence from a database such as NCBI/EMBL/Swiss-Prot. You might also find an interesting gene/sequence from a journal. KLMNTRARLIVHISG LTRK………………………… …………………… Your sequence
  • 5. • In that case, you might want to know if the sequence that you have, already exists or is similar to some sequences in a database, may be down to a particular organism database. • Why do you want to know that? • Because you can infer structural, functional and evolutionary relationship to your query sequence. Introduction Already in here? Similar? Your sequence
  • 6. ???????????????????????????? Your Sequence Unknown Sequence What is this Sequence? Where does it come from? KLMNTRARLIVHISGLTRK
  • 7. Introducing BLAST (Basic Local Alignment Search Tool)  BLAST tool is used to compare a query sequence with a library or database of sequences.  It uses a heuristic search algorithm based on statistical methods. The algorithm was invented by Stephen Altschul and his co-workers in 1990.  BLAST programs were designed for fast database searching.
  • 11. BLAST Algorithm (Protein) L E H K M G S Query Sequence Length 11 L E H E H K H K M This generates 11 – 3 + 1 = 9 words H K M H K M Y A N C Y A N W = 3
  • 12. BLAST Algorithm Example L E H For each word from a window = 3, generate neighborhood words using BLOSUM62 matrix with score threshold = 11 L M H D E H L E H C E H L K H Q E H L F H L E R . . . All aligned with LEH using BLOSUM62 (then sorted by scores) 17 13 12 10 9 11 9 9 Score threshold (cut off here) 20320 x 20 x 20 alignments Sorted by scores 3 Amino Acids
  • 13. BLAST Algorithm Example L E H C E H L K H Q E H Word List DAPCQEHKRGWPNDC L E H Database sequences L E H L E H L E H L E H L K H L K H C E H C E H QEH Exact matches of words from the word list to the database sequences
  • 14. Q E H D A P C Q E H K R G W P N D C For each exact word match, alignment is extended in both directions to find high score segments. Extended in the right direction Max drop off score X= 2 0 5 10 15 20 25 30 Q-Q E-E H-H K-K M-R G-G S-W AccumulatedScore 5 5 8 Score drop = 3 > X Score drop = 1 <= X Trim to max Query = Y A N C L E H K M G S K 5 235 10 18 M -1 22 G 6 28 S -3 25
  • 15. Q E H D A P C Q E H K R G W P N D C For each exact word match, alignment is extended in both directions to find high score segments. Extended in the left direction K M G Max drop off score X= 2 0 5 10 15 20 25 30 35 H-H E-E Q-Q C-C N-P A-A Y-D AccumulatedScore 5 5 8 Score drop = 3 > X Score drop = 2 <= X Query = Y A N C L E H K M G S 18 13 8 C 9 27 N -2 25 A 4 29 Y -3 26
  • 16. BLAST Algorithm Example A P C Q E H K R G 5 -1 65 5 894 -2 Maximal Segment Pair (MSP) Pair Score = 4-2+9+5+5+8+5-1+6 = 39 A N C Q E H K M G BLOSUM62 Scoring Matrix
  • 17. A P C Q E H K R G A N C Q E H K M G 39 Maximal Segment Pairs (MSPs) from other seeds Sorted by alignment scores 42 45 35 37 51 55 33 BLAST Algorithm Example Each match has its own E-Value
  • 18.  E-Value: The number of MSPs with similar score or higher that one can EXPECT to see by chance alone when searching a database of a particular size. BLAST Algorithm Expect Value (E-Value)
  • 19.  For example: if the E-Value is equal to 10 for a particular MSP with score S, one can say that actually…about 10 MSPs with score >= S can just happen by chance alone (for any query sequence).  So most likely that our MSP is not a significant match at all. BLAST Algorithm Expect Value (E-Value)
  • 20.  If E-Value if very small e.x. 10-4 (very high score S), one can say that it is almost impossible that there would be any MSP with score >= S.  Thus, our MSP is a pretty significant match (homologous). BLAST Algorithm Expect Value (E-Value)
  • 21.  First: Calculate bit score  S = Score of the alignment (Raw Score)  , values depend on the scoring scheme and sequence composition of a database. [log value is natural logarithm (log base e)] BLAST Algorithm E-Value Calculation
  • 22.  The lower the E-Value, the better.  E-Value can be used to limit the number of hits in the result page. BLAST Algorithm Expect Value (E-Value)
  • 23.  Second: Calculate E-Value  = Bit Score  m = query length  n = length of database BLAST Algorithm E-Value Calculation
  • 24. • E-values of 10-4 and lower indicate a significant homology. • E-values between 10-4 and 10-2 should be checked (similar domains, maybe non-homologous). • E-values between 10-2 and 1 do not indicate a good homology BLAST Algorithm E-Value Interpretation
  • 25. Gapped BLAST  The Gapped BLAST algorithm allows gaps to be introduced into the alignments. That means similar regions are not broken into several segments.  This method reflects biological relationships much better.  This results in different parameter values when calculating E-Value ( , ).
  • 26. BLAST programs Name Description Blastp Amino acid query sequence against a protein database Blastn Nucleotide query sequence against a nucleotide sequence database Blastx Nucleotide query sequence translated in all reading frames against a protein database Tblastn Protein query sequence against a nucleotide sequence database dynamically translated in all reading frames Tblastx Six frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
  • 27. BLAST programs Name Common Word Size Blastp 3 Blastn 11 Blastx 3 Tblastn 3 Tblastx 3
  • 28. BLAST Suggestion  Where possible use translated sequence (Protein).  Split large query sequence (if > 1000 for DNA, >200 for protein) into small ones.  If the query has low complexity regions or repeated segments, remove them and repeat the search. IVLKVALRPVLRPVLRPVWQARNGS Repeated segments might confuse the program to find the ‘real’ significant matches in a database.
  • 29. Running BLAST  Find appropriate BLAST program  Enter query sequence  Select database to search  Run BLAST search  Analyze output  Interpret E-values
  • 30. Documenting BLAST  Program (Blastp, Blastn,..)  Name of database  Word size  E-Value threshold  Substitution matrix  Gap penalty  BLAST results: Sequence Name, Bit Score, Raw Score, E-Value, Identities, Positives, Gaps
  • 32. Homework 4A  Determine the common proteins in Domestic Cat [Felis catus], Tiger [Panthera tigris] and Snow Leopard [Uncia uncia] using this initiating sequence >gi|145558804 MSMVYINMFLAFIMSLMGLLMYRSHLMSSLLCLEGMMLSLFIMMTVAILNNHFTLASMTPII LLVFAACEAALGLSLLVMVSNTYGTDYVQNLNLLQC  Report for each protein match: Protein name, accession number, bit score, raw score, E- Value, Identities, Positives and Gaps.
  • 33. Homework 4B  H5N1 is the subtype of the Influenza A Virus which is a bird-adapted strain. This subtype can cause “avian influenza” or “bird flu” which is fatal to human.  Use DNA sequence with GenBank Accession number JX120150.1 as a seed sequence to search for other TWO matching sequences, each belonging to a different Influenza A virus subtypes (HXNX). [Use Blastn]  Report for each subtype match: Subtype name, Organism origin, Sequence name, accession number, bit score, raw score, E- Value, Identities, Positives and Gaps
  • 34. Homework 4C  Suppose you have acquired an unknown protein sequence FLWLWPYLSYIEAVPIRKVQDDTKTLIKTIVTRINDISHTQAVSSKQRVAGLDFIP GLHPVLSLSRMDQTLAIYQQILTSLHSRNVVQISNDLENLRDLLHLLASSKS  (1) Use BLAST program to find out which species this sequence most likely belongs to.  (2) Report both scientific and common name for the species.  (3) This sequence matches to a certain protein of that species, Report E-Value, protein accession number [GenBank], Protein name, Length, Full sequence and Function.
  • 35. Homework 4D  Calculate E-Value for an MSP with  Raw Score : 83  Query Length : 103  Length of database : 48,109,873  : 0.316  : 0.135
  • 36. Send me email with subject “HW3_BINF_lastname_id” by 28 June before 5pm. Late submission will NOT be accepted.