SlideShare a Scribd company logo
1 of 24
“Homology-enhanced probabilistic consistency”
multiple sequence alignment :
a case study on transmembrane protein
Jia-Ming Chang
2013-July-09
Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of
transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.
Transmembrane protein
Membrane proteins are likely to constitute 20-30% of all ORFs
contained in genomes.
Odorant receptors
Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.)
326, no. 5951 (October 16, 2009): 382-383.
Transmembrane protein multiple
sequence alignment
• 1994 first address alignment for transmembrane proteins
– Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy for
transmembrane proteins. J Mol Biol 1994, 243(3):388-396.
• Few multiple sequence alignment software till now => 3
– ShafrirY, Guy HR: STAM: simple transmembrane alignment method.
Bioinformatics 2004, 20(5):758-769.
– Forrest LR,Tang CL, Honig B: On the accuracy of homology modeling and
sequence alignment methods applied to membrane proteins. BiophysJ 2006,
91(2):508-517.
– PirovanoW, Feenstra KA, Heringa J: PRALINETM: a strategy for improved
multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-
497.
BAliBASE 2.0 reference 7
Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple
alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-497.
We need an accurate Transmembrane MSA!
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids
Res 2005, 33(3):816-824.
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids
Res 2005, 33(3):816-824.
Pair-hidden Markov Model
Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-
based multiple sequence alignment. Genome Res 2005, 15(2):330-340.
Emission probabilities, which correspond to traditional substitution
scores, are based on the BLOSUM62 matrix.
Probabilistic consistency
transformation
Homology-extended probabilistic
consistency
New emission probabilities are like the following.
20 20
)..,..(),('
m n
nmnmji AAAApyxp
where αm is the frequency with which residue m appears at
position i and βn is the frequency with which residue n appears
at position j; p(A.A.m, A.A.n) is the original emission
probabilities in ProbCons.
Homology-extended probabilistic
consistency
P(xi ~ y j Îa* | x,y)¬
1
S
aigkP xi ~ zk Îa* | x,z( )
zk
å · bjgkP zk ~ y j Îa* | z,y( )zÎS
å
where αi , βj , and rk are the profile frequency.
Homology-extended
Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids
Res 2005, 33(3):816-824.
Que1: how to
build a profile?
Que2: how to
score profiles?
Que1: how to build a profile?
• Database Size
• Searching parameters
– E-value : most used, anything else???
1. Matrix file : -M
2. Filter the query sequence for low-complexity subsequence : -F
3. Neighborhood word threshold : -f
4. Truncates the report to number of alignments: -b
Word hit & Neighborhood
Searching parameters
• Fast, Insensitive search
– High percent identity
– blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5
• Slow, Sensitive search
– Increase sensitivity, decrease specificity
– blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b 10000 –v 10000
• Book “BLAST”, page 146, 147
UniRef50
TM
UniRef90
TM
UniRef100
TM
UniProt
TM
Different database
UniProt (release 15.15 – 2010)
NCBI non-redundant (NR)
UniRef50 UniRef90 UniRef100
keyword:"Transmembrane [KW-0812]"
Database Size
Data Set No.
UniRef50-TM 87,989
UniRef90-TM 263,306
UniRef100-TM 613,015
UniProt-TM 818,635
UniRef50 3,077,464
UniRef90 6,544,144
UniRef100 9,865,668
UniProt 11,009,767
NCBI NR 10,565,004
UniRef50
TM
UniRef90
TM
UniRef100
TM
UniProt
TM
UniProt (release 15.15 – 2010)
NCBI non-redundant (NR)
UniRef50 UniRef90 UniRef100
keyword:"Transmembrane [KW-0812]"
Performance comparison of different
database sizes for the BAliBASE2-ref7.
UniRef50-TM contains about 100 times fewer sequences than the full UniProt.
The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee
while the CPU time requirements are dramatically decreased by a factor 10.
10% more columns are correctly aligned when compared with
PRALINETM .
The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and
columns in the reference alignments are 3,294,102 and 1,781, respectively.
BAliBASE 3.0
The performance of other methods are from Rausch et al. The SP and TC scores of full-
length sequences are evaluated by core blocks (by xml).
Que2: how to score profiles?
Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile
alignment. Bioinformatics 2004, 20(8):1301-1308.
• Prediction mode : –template_file PSITM
• Output : -output tm_html
This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species.
Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.
Paolo Di Tommaso
http://tcoffee.crg.cat/tmcoffee

More Related Content

Similar to TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisYaoyu Wang
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotationScott Dawson
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
Function and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge FusionFunction and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge FusionKarin Verspoor
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomicsSukhjinder Singh
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...eventi-ITBbari
 
Sl4.0 and ITAG4.0
Sl4.0 and ITAG4.0Sl4.0 and ITAG4.0
Sl4.0 and ITAG4.0solgenomics
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical NotebookNaima Tahsin
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2IAEME Publication
 
Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...Casey Bergman
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...Spencer Bliven
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 

Similar to TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee (20)

Comparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression AnalysisComparison between RNASeq and Microarray for Gene Expression Analysis
Comparison between RNASeq and Microarray for Gene Expression Analysis
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
2 md2016 annotation
2 md2016 annotation2 md2016 annotation
2 md2016 annotation
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
Function and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge FusionFunction and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge Fusion
 
Predicting Functional Regions in Genomic DNA Sequences Using Artificial Neur...
Predicting Functional Regions in Genomic DNA Sequences Using  Artificial Neur...Predicting Functional Regions in Genomic DNA Sequences Using  Artificial Neur...
Predicting Functional Regions in Genomic DNA Sequences Using Artificial Neur...
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
Ernesto Picardi – Bioinformatica e genomica comparata: nuove strategie sperim...
 
Sl4.0 and ITAG4.0
Sl4.0 and ITAG4.0Sl4.0 and ITAG4.0
Sl4.0 and ITAG4.0
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Bioinformatics.Practical Notebook
Bioinformatics.Practical NotebookBioinformatics.Practical Notebook
Bioinformatics.Practical Notebook
 
Yeast Genome
Yeast Genome Yeast Genome
Yeast Genome
 
Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2Tag snp selection using quine mc cluskey optimization method-2
Tag snp selection using quine mc cluskey optimization method-2
 
Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...Validating and improving the D. melanogaster reference genome sequence using ...
Validating and improving the D. melanogaster reference genome sequence using ...
 
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
2018-05-24 Research update on Armadillo Repeat Proteins: Evolution and Design...
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 

Recently uploaded

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Recently uploaded (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

TM-Coffee : Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee

  • 1. “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.
  • 2. Transmembrane protein Membrane proteins are likely to constitute 20-30% of all ORFs contained in genomes. Odorant receptors Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.) 326, no. 5951 (October 16, 2009): 382-383.
  • 3. Transmembrane protein multiple sequence alignment • 1994 first address alignment for transmembrane proteins – Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy for transmembrane proteins. J Mol Biol 1994, 243(3):388-396. • Few multiple sequence alignment software till now => 3 – ShafrirY, Guy HR: STAM: simple transmembrane alignment method. Bioinformatics 2004, 20(5):758-769. – Forrest LR,Tang CL, Honig B: On the accuracy of homology modeling and sequence alignment methods applied to membrane proteins. BiophysJ 2006, 91(2):508-517. – PirovanoW, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492- 497.
  • 4. BAliBASE 2.0 reference 7 Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-497.
  • 5. We need an accurate Transmembrane MSA!
  • 6. Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
  • 7. Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.
  • 8. Pair-hidden Markov Model Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency- based multiple sequence alignment. Genome Res 2005, 15(2):330-340. Emission probabilities, which correspond to traditional substitution scores, are based on the BLOSUM62 matrix.
  • 10. Homology-extended probabilistic consistency New emission probabilities are like the following. 20 20 )..,..(),(' m n nmnmji AAAApyxp where αm is the frequency with which residue m appears at position i and βn is the frequency with which residue n appears at position j; p(A.A.m, A.A.n) is the original emission probabilities in ProbCons.
  • 11. Homology-extended probabilistic consistency P(xi ~ y j Îa* | x,y)¬ 1 S aigkP xi ~ zk Îa* | x,z( ) zk å · bjgkP zk ~ y j Îa* | z,y( )zÎS å where αi , βj , and rk are the profile frequency.
  • 12. Homology-extended Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824. Que1: how to build a profile? Que2: how to score profiles?
  • 13. Que1: how to build a profile? • Database Size • Searching parameters – E-value : most used, anything else??? 1. Matrix file : -M 2. Filter the query sequence for low-complexity subsequence : -F 3. Neighborhood word threshold : -f 4. Truncates the report to number of alignments: -b
  • 14. Word hit & Neighborhood
  • 15. Searching parameters • Fast, Insensitive search – High percent identity – blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5 • Slow, Sensitive search – Increase sensitivity, decrease specificity – blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b 10000 –v 10000 • Book “BLAST”, page 146, 147
  • 16. UniRef50 TM UniRef90 TM UniRef100 TM UniProt TM Different database UniProt (release 15.15 – 2010) NCBI non-redundant (NR) UniRef50 UniRef90 UniRef100 keyword:"Transmembrane [KW-0812]"
  • 17. Database Size Data Set No. UniRef50-TM 87,989 UniRef90-TM 263,306 UniRef100-TM 613,015 UniProt-TM 818,635 UniRef50 3,077,464 UniRef90 6,544,144 UniRef100 9,865,668 UniProt 11,009,767 NCBI NR 10,565,004 UniRef50 TM UniRef90 TM UniRef100 TM UniProt TM UniProt (release 15.15 – 2010) NCBI non-redundant (NR) UniRef50 UniRef90 UniRef100 keyword:"Transmembrane [KW-0812]"
  • 18. Performance comparison of different database sizes for the BAliBASE2-ref7. UniRef50-TM contains about 100 times fewer sequences than the full UniProt. The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee while the CPU time requirements are dramatically decreased by a factor 10.
  • 19.
  • 20. 10% more columns are correctly aligned when compared with PRALINETM . The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively.
  • 21. BAliBASE 3.0 The performance of other methods are from Rausch et al. The SP and TC scores of full- length sequences are evaluated by core blocks (by xml).
  • 22. Que2: how to score profiles? Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):1301-1308.
  • 23. • Prediction mode : –template_file PSITM • Output : -output tm_html This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species. Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.