SlideShare una empresa de Scribd logo
1 de 59
Systematic evaluation
of spliced alignment programs
for RNA-seq data
Engström et al. (Nature Methods 2013)
Presented by Monica Drăgan
2Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
3Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
4Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
© bioinformatics.ca
Mapping the reads to
●
a reference genome
or
●
a transcriptome database
Deep sequencing (with NGS)
5Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
© bioinformatics.ca
Why RNA sequencing?
●
Functional studies
●
Gene prediction is difficult
6Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
7Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
Mapping strategies depend on read length
●
Read length < 50 bp
●
Read length > 50 bp
8Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
Mapping strategies depend on read length
●
Read length < 50 bp → Short (Unspliced) aligners
●
Read length > 50 bp
BWA BOWTIE
9Functional Genomics, SS2014Dienstag, 25. März 2014
Systematic evaluation
of spliced alignment programs
for RNA-seq data
Mapping strategies depend on read length
●
Read length < 50 bp → Short (Unspliced) aligners
●
Read length > 50 bp → Spliced alignment programs
●
In mRNA sequences the introns were removed
BWA BOWTIE
GSNAP
MapSplice
STAR
PAL Mapper
TopHat
ReadsMapPASS
SMALT
10Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in RNA sequence alignment
 The aim of this paper
 Existing spliced-alignment software
 Conclusions
11Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in RNA sequence alignment
 The aim of this paper
 Existing spliced-alignment software
 Conclusions
12Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
13Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads → ~100M = computationally
expensive
14Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads → ~100M = computationally
expensive Compression with
Burrows-Wheeler
Transform
15Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing
16Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing
17Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
18Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
 a single gene may code
for multiple proteins
19Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
 Paired read separation issue
20Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
 Paired read separation issue
21Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
 Paired read separation issue
 Pseudogenes
22Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
 Paired read separation issue
 Pseudogenes
 pseudogenes often have highly similar sequences to functional,
intron-containing genes → RNA reads can incorrectly be mapped
here
 the human genome, which contains over 14,000 pseudogenes [Pei
et al. Genome Biol 2012]
23Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
 Paired read separation issue
 Pseudogenes
 Duplications
24Functional Genomics, SS2014Dienstag, 25. März 2014
Challenges in RNA-seq alignment
 Large #reads
 RNA Splicing / Alternative splicing
 Paired read separation issue
 Pseudogenes
 Duplications
 may correspond to biased PCR amplification of particular fragments
25Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in RNA sequence alignment
 The aim of this paper
 Existing spliced-alignment software
 Conclusions
26Functional Genomics, SS2014Dienstag, 25. März 2014
The aim of this paper
 Asses the performance of 26 RNA seq alignment
protocols –based on 11 programs on real and simulated
human and mouse transcriptomes
 Alignment protocols were evaluated on Illumina 76-
nucleotide
 paired-end RNA-seq data from:
 the human leukemia cell line K562 (1.3 × 109 reads)
 mouse brain (1.1 × 108 reads) and two simulated
27Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in RNA sequence alignment
 The aim of this paper
 Existing spliced-alignment software
 TopHat
 MapSplice
 STAR
 GSNAP

 Conclusions
28Functional Genomics, SS2014Dienstag, 25. März 2014
unspliced
alignment
TopHat
Trapnell, Pachter, and Salzberg (2009)
29Functional Genomics, SS2014Dienstag, 25. März 2014
unspliced
alignment
- reads that map to more than
10 locations
- reads that have more than a
few mismatches
TopHat
Trapnell, Pachter, and Salzberg (2009)
30Functional Genomics, SS2014Dienstag, 25. März 2014
unspliced
alignment
assemble
islands of sequences
- reads that map to more than
10 locations
- reads that have more than a
few mismatches
TopHat
Trapnell, Pachter, and Salzberg (2009)
31Functional Genomics, SS2014Dienstag, 25. März 2014
unspliced
alignment
assemble
Such an approach will identify only known
or predicted combinations of exons
TopHat
Trapnell, Pachter, and Salzberg (2009)
32Functional Genomics, SS2014Dienstag, 25. März 2014
TopHat
Trapnell, Pachter, and Salzberg (2009)
unspliced
alignment
spliced
alignment
33Functional Genomics, SS2014Dienstag, 25. März 2014
TopHat
Trapnell, Pachter, and Salzberg (2009)
34Functional Genomics, SS2014Dienstag, 25. März 2014
TopHat
Trapnell, Pachter, and Salzberg (2009)
Known junction signals:
GT-AG, GC-AG, and AT-AC
35Functional Genomics, SS2014Dienstag, 25. März 2014
TopHat
Trapnell, Pachter, and Salzberg (2009)
If an alignment extends into
an intron region, realign the reads
to the adjacent exons instead
Known junction signals:
GT-AG, GC-AG, and AT-AC
36Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in sequence alignment
 What the paper is about
 Existing software
 TopHat
 MapSplice
 STAR
 GSNAP
 Conclusions
 Future work
37Functional Genomics, SS2014Dienstag, 25. März 2014
MapSplice
Wang et al. (2010)
 Similar to TopMap
 Reads = tags
 A tag has an ‘exonic alignment’ if it can be aligned in its
entirety to a consecutive sequence of nucleotides in G.
 T has a ‘spliced alignment’ if its alignment to G Requires
one or more gaps
38Functional Genomics, SS2014Dienstag, 25. März 2014
MapSplice
Wang et al. (2010)
Step 1: exonic alignment
39Functional Genomics, SS2014Dienstag, 25. März 2014
MapSplice
Wang et al. (2010)
Step 2: spliced alignment
●
the spliced alignment of tj+1
to the genomic interval between
anchors tj and tj+2
●
consider all the possible positions
of the splice site and map according
to the Hamming distace
40Functional Genomics, SS2014Dienstag, 25. März 2014
MapSplice
Wang et al. (2010)
Step 3: merge candidate segment alignments
41Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in sequence alignment
 What the paper is about
 Existing software
 TopHat
 MapSplice
 STAR
 GSNAP
 Conclusions
 Future work
42Functional Genomics, SS2014Dienstag, 25. März 2014
STAR
Dobin et al. (2012)
Maximal Mappable Prefix (read location i) =
the longest read substring from position i
that has exact match on one
or more substrings of the ref genome
poor genomic alignment
Detect:
(a) splice junctions
(b) mismatches
(c) tails
43Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in sequence alignment
 What the paper is about
 Existing software
 TopHat
 MapSplice
 STAR
 GSNAP
 Conclusions
 Future work
44Functional Genomics, SS2014Dienstag, 25. März 2014
GSNAP
Wu and Nacu (2010)
Efficient detection of indels and splice pairs:
 For large genomes, it is more efficient to preprocess the
genome rather than the reads to create genomic
index files, which provide genomic positions for a given
prefix/suffix.
 Works with candidate regions in the ref genome. (keep
track of the read location of 12 residues that support each
candidate region)
45Functional Genomics, SS2014Dienstag, 25. März 2014
GSNAP
Wu and Nacu (2010)
46Functional Genomics, SS2014Dienstag, 25. März 2014
For a more powerful use of the algorithms:
 use of available gene annotations, which allow it to avoid
erroneously mapping reads to pseudogenes
 use the information about the pair sof the paired read
47Functional Genomics, SS2014Dienstag, 25. März 2014
Outline
 Challenges in RNA sequence alignment
 The aim of this paper
 Existing spliced-alignment software
 Conclusions
48Functional Genomics, SS2014Dienstag, 25. März 2014
Conclusions
 Mismatches and basewise accuracy
MapSplice, PASS and TopHat display a low tolerance for mismatches.
Consequently, a large proportion of reads with low base-call quality scores
were not mapped by these methods
49Functional Genomics, SS2014Dienstag, 25. März 2014
Conclusions
 Mismatches and basewise accuracy
●
GSNAP, GSTRUCT, MapSplice,PASS, SMALT and STAR allow missmatches an can also
output an incomplete alignment when they are unable to map an entire sequence
50Functional Genomics, SS2014Dienstag, 25. März 2014
Conclusions
 Mismatches and basewise accuracy
Reads from mouse were mapped (against the mouse reference assembly17) at a greater rate and
with fewer mismatches than those from K562 (the cancer cell line K562 accumulated a lot of
mutations with respect to the human reference assembly).
51Functional Genomics, SS2014Dienstag, 25. März 2014
Conclusions
 Indel frequency
and accuracy
.
●
GSTRUCT produced the most uniform
distribution of indels
(coefficient of variation (CV) = 0.32)
●
TopHat produced the most variable
distribution
(CV = 1.5 and 1.1 splice junctions)
Size distribution of indels
for the human K562 data set
Precision and recall, stratified by indel size
GEM and PALMapper output included more
indels than any other method
52Functional Genomics, SS2014Dienstag, 25. März 2014
Conclusions
 Indel frequency
and accuracy
●
GEM and PALMapper report many false indels
(precision)
●
GSNAP and GSTRUCT exhibit high sensitivity
for deletions, independent of size (recall)
●
TopHat2 protocol is the most
sensitive method for long insertions (recall)
Precision and recall, stratified by indel size
53Functional Genomics, SS2014Dienstag, 25. März 2014
Conclusions
 Spliced alignment
●
High accuracy discovery rate for
ReadsMap, GSNAP, GSTRUCT and
MapSplice and TopHat
●
#false junction calls was greatly reduced
if junctions were filtered by supporting
alignment counts (plot c)
●
Protocols using annotation recovered
nearly all of the known junctions in
expressed transcripts (plot d)
●
For novel-junction discovery,
GSTRUCT outperformed other methods
●
54Functional Genomics, SS2014Dienstag, 25. März 2014
Conclusions
 GSNAP, GSTRUCT, MapSplice and STAR compared
favorably to the other methods
 MapSplice seems to be a conservative aligner with respect to
mismatch frequency, indel and exon junction calls.
 The most significant issue with GSNAP, GSTRUCT and
STAR is the presence of many false exon junctions in the
output.
 Both GSNAP and GSTRUCT require considerable computing
time when parameterized for sensitive spliced alignment
55Functional Genomics, SS2014Dienstag, 25. März 2014
Thank you!
56Functional Genomics, SS2014Dienstag, 25. März 2014
 Remaining challenges:
 Remaining challenges include exploiting gene annotation
with-
 out introducing bias, correctly placing multimapped reads,
achiev-
 ing optimal yet fast alignment around gaps and
mismatches, and
 Analysis
 reducing the number of false exon junctions reported.
Ongoing
 developments in sequencing technology will demand
efficient
 processing of longer reads with higher error rates and will
require
 more extensive spliced alignment as reads span multiple
57Functional Genomics, SS2014Dienstag, 25. März 2014
 Some RNA-seq aligners, including GSNAP [5], RUM [6],
and STAR [7], map reads independently of the alignments
of other reads, which may explain their lower sensitivity for
these spliced reads
 GSNAP [5] and STAR [7] also make use of annotation,
although they use it in a more limited fashion in order to
detect splice sites
58Functional Genomics, SS2014Dienstag, 25. März 2014
 have shown how suffix arrays (Manber
 and Myers, 1990), compressed using a Burrows-Wheeler
Transform
 (BWT) (Burrows and Wheeler, 1994), can rapidly map
reads that
 are exact matches or have a few mismatches or short
insertions or
 deletions (indels) relative to the reference.

59Functional Genomics, SS2014Dienstag, 25. März 2014
 A third approach, provided by the QPALMA program (Bona
 et al., 2008), can align individual reads across exon–exon
junctions
 using Smith–Waterman-type alignments and a specifically
trained
 splice site model.


Más contenido relacionado

Similar a Systematic evaluation of spliced alignment programs for RNA-seq data

CRISPR cas, a potential tool for targeted genome modification in crops.
CRISPR cas, a potential tool for targeted genome modification in crops.CRISPR cas, a potential tool for targeted genome modification in crops.
CRISPR cas, a potential tool for targeted genome modification in crops.UAS,GKVK<BANGALORE
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_versionDago Noel
 
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...Human Variome Project
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
 
Guide Picker Poster V3
Guide Picker Poster V3Guide Picker Poster V3
Guide Picker Poster V3Soren Hough
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptxMalihaTanveer1
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Groupnist-spin
 
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Jonathan Eisen
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Gunnar Rätsch
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
Goodwin2016 ngs 10 years
Goodwin2016 ngs 10 yearsGoodwin2016 ngs 10 years
Goodwin2016 ngs 10 yearsPrakash Koringa
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vstQiang Kou
 
How CRISPR–Cas9 Screening will revolutionise your drug development programs
How CRISPR–Cas9 Screening will revolutionise your drug development programsHow CRISPR–Cas9 Screening will revolutionise your drug development programs
How CRISPR–Cas9 Screening will revolutionise your drug development programsHorizonDiscovery
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIAllen Day, PhD
 
Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPREdward Perello
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
Experimental Designs in Next Generation Sequencing
Experimental Designs in Next Generation Sequencing Experimental Designs in Next Generation Sequencing
Experimental Designs in Next Generation Sequencing GuttiPavan
 

Similar a Systematic evaluation of spliced alignment programs for RNA-seq data (20)

CRISPR cas, a potential tool for targeted genome modification in crops.
CRISPR cas, a potential tool for targeted genome modification in crops.CRISPR cas, a potential tool for targeted genome modification in crops.
CRISPR cas, a potential tool for targeted genome modification in crops.
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 
2944_IJDR_final_version
2944_IJDR_final_version2944_IJDR_final_version
2944_IJDR_final_version
 
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment Discrepancies
 
Guide Picker Poster V3
Guide Picker Poster V3Guide Picker Poster V3
Guide Picker Poster V3
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptx
 
Forensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics GroupForensics: Human Identity Testing in the Applied Genetics Group
Forensics: Human Identity Testing in the Applied Genetics Group
 
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
Phylogeny-driven approaches to microbial & microbiome studies: talk by Jonath...
 
Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)Talk ABRF 2015 (Gunnar Rätsch)
Talk ABRF 2015 (Gunnar Rätsch)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
Goodwin2016 ngs 10 years
Goodwin2016 ngs 10 yearsGoodwin2016 ngs 10 years
Goodwin2016 ngs 10 years
 
DEseq, voom and vst
DEseq, voom and vstDEseq, voom and vst
DEseq, voom and vst
 
How CRISPR–Cas9 Screening will revolutionise your drug development programs
How CRISPR–Cas9 Screening will revolutionise your drug development programsHow CRISPR–Cas9 Screening will revolutionise your drug development programs
How CRISPR–Cas9 Screening will revolutionise your drug development programs
 
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGIHadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
Hadoop and Genomics - What you need to know - 2015.04.09 - Shenzhen - BGI
 
20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop20140710 1 day1_nist_ercc2.0workshop
20140710 1 day1_nist_ercc2.0workshop
 
Making the cut with CRISPR
Making the cut with CRISPRMaking the cut with CRISPR
Making the cut with CRISPR
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Experimental Designs in Next Generation Sequencing
Experimental Designs in Next Generation Sequencing Experimental Designs in Next Generation Sequencing
Experimental Designs in Next Generation Sequencing
 

Último

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfSanaAli374401
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 

Último (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Systematic evaluation of spliced alignment programs for RNA-seq data

  • 1. Systematic evaluation of spliced alignment programs for RNA-seq data Engström et al. (Nature Methods 2013) Presented by Monica Drăgan
  • 2. 2Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data
  • 3. 3Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data
  • 4. 4Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data © bioinformatics.ca Mapping the reads to ● a reference genome or ● a transcriptome database Deep sequencing (with NGS)
  • 5. 5Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data © bioinformatics.ca Why RNA sequencing? ● Functional studies ● Gene prediction is difficult
  • 6. 6Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data
  • 7. 7Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data Mapping strategies depend on read length ● Read length < 50 bp ● Read length > 50 bp
  • 8. 8Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data Mapping strategies depend on read length ● Read length < 50 bp → Short (Unspliced) aligners ● Read length > 50 bp BWA BOWTIE
  • 9. 9Functional Genomics, SS2014Dienstag, 25. März 2014 Systematic evaluation of spliced alignment programs for RNA-seq data Mapping strategies depend on read length ● Read length < 50 bp → Short (Unspliced) aligners ● Read length > 50 bp → Spliced alignment programs ● In mRNA sequences the introns were removed BWA BOWTIE GSNAP MapSplice STAR PAL Mapper TopHat ReadsMapPASS SMALT
  • 10. 10Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in RNA sequence alignment  The aim of this paper  Existing spliced-alignment software  Conclusions
  • 11. 11Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in RNA sequence alignment  The aim of this paper  Existing spliced-alignment software  Conclusions
  • 12. 12Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads
  • 13. 13Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads → ~100M = computationally expensive
  • 14. 14Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads → ~100M = computationally expensive Compression with Burrows-Wheeler Transform
  • 15. 15Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing
  • 16. 16Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing
  • 17. 17Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing
  • 18. 18Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing  a single gene may code for multiple proteins
  • 19. 19Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing  Paired read separation issue
  • 20. 20Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing  Paired read separation issue
  • 21. 21Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing  Paired read separation issue  Pseudogenes
  • 22. 22Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing  Paired read separation issue  Pseudogenes  pseudogenes often have highly similar sequences to functional, intron-containing genes → RNA reads can incorrectly be mapped here  the human genome, which contains over 14,000 pseudogenes [Pei et al. Genome Biol 2012]
  • 23. 23Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing  Paired read separation issue  Pseudogenes  Duplications
  • 24. 24Functional Genomics, SS2014Dienstag, 25. März 2014 Challenges in RNA-seq alignment  Large #reads  RNA Splicing / Alternative splicing  Paired read separation issue  Pseudogenes  Duplications  may correspond to biased PCR amplification of particular fragments
  • 25. 25Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in RNA sequence alignment  The aim of this paper  Existing spliced-alignment software  Conclusions
  • 26. 26Functional Genomics, SS2014Dienstag, 25. März 2014 The aim of this paper  Asses the performance of 26 RNA seq alignment protocols –based on 11 programs on real and simulated human and mouse transcriptomes  Alignment protocols were evaluated on Illumina 76- nucleotide  paired-end RNA-seq data from:  the human leukemia cell line K562 (1.3 × 109 reads)  mouse brain (1.1 × 108 reads) and two simulated
  • 27. 27Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in RNA sequence alignment  The aim of this paper  Existing spliced-alignment software  TopHat  MapSplice  STAR  GSNAP   Conclusions
  • 28. 28Functional Genomics, SS2014Dienstag, 25. März 2014 unspliced alignment TopHat Trapnell, Pachter, and Salzberg (2009)
  • 29. 29Functional Genomics, SS2014Dienstag, 25. März 2014 unspliced alignment - reads that map to more than 10 locations - reads that have more than a few mismatches TopHat Trapnell, Pachter, and Salzberg (2009)
  • 30. 30Functional Genomics, SS2014Dienstag, 25. März 2014 unspliced alignment assemble islands of sequences - reads that map to more than 10 locations - reads that have more than a few mismatches TopHat Trapnell, Pachter, and Salzberg (2009)
  • 31. 31Functional Genomics, SS2014Dienstag, 25. März 2014 unspliced alignment assemble Such an approach will identify only known or predicted combinations of exons TopHat Trapnell, Pachter, and Salzberg (2009)
  • 32. 32Functional Genomics, SS2014Dienstag, 25. März 2014 TopHat Trapnell, Pachter, and Salzberg (2009) unspliced alignment spliced alignment
  • 33. 33Functional Genomics, SS2014Dienstag, 25. März 2014 TopHat Trapnell, Pachter, and Salzberg (2009)
  • 34. 34Functional Genomics, SS2014Dienstag, 25. März 2014 TopHat Trapnell, Pachter, and Salzberg (2009) Known junction signals: GT-AG, GC-AG, and AT-AC
  • 35. 35Functional Genomics, SS2014Dienstag, 25. März 2014 TopHat Trapnell, Pachter, and Salzberg (2009) If an alignment extends into an intron region, realign the reads to the adjacent exons instead Known junction signals: GT-AG, GC-AG, and AT-AC
  • 36. 36Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in sequence alignment  What the paper is about  Existing software  TopHat  MapSplice  STAR  GSNAP  Conclusions  Future work
  • 37. 37Functional Genomics, SS2014Dienstag, 25. März 2014 MapSplice Wang et al. (2010)  Similar to TopMap  Reads = tags  A tag has an ‘exonic alignment’ if it can be aligned in its entirety to a consecutive sequence of nucleotides in G.  T has a ‘spliced alignment’ if its alignment to G Requires one or more gaps
  • 38. 38Functional Genomics, SS2014Dienstag, 25. März 2014 MapSplice Wang et al. (2010) Step 1: exonic alignment
  • 39. 39Functional Genomics, SS2014Dienstag, 25. März 2014 MapSplice Wang et al. (2010) Step 2: spliced alignment ● the spliced alignment of tj+1 to the genomic interval between anchors tj and tj+2 ● consider all the possible positions of the splice site and map according to the Hamming distace
  • 40. 40Functional Genomics, SS2014Dienstag, 25. März 2014 MapSplice Wang et al. (2010) Step 3: merge candidate segment alignments
  • 41. 41Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in sequence alignment  What the paper is about  Existing software  TopHat  MapSplice  STAR  GSNAP  Conclusions  Future work
  • 42. 42Functional Genomics, SS2014Dienstag, 25. März 2014 STAR Dobin et al. (2012) Maximal Mappable Prefix (read location i) = the longest read substring from position i that has exact match on one or more substrings of the ref genome poor genomic alignment Detect: (a) splice junctions (b) mismatches (c) tails
  • 43. 43Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in sequence alignment  What the paper is about  Existing software  TopHat  MapSplice  STAR  GSNAP  Conclusions  Future work
  • 44. 44Functional Genomics, SS2014Dienstag, 25. März 2014 GSNAP Wu and Nacu (2010) Efficient detection of indels and splice pairs:  For large genomes, it is more efficient to preprocess the genome rather than the reads to create genomic index files, which provide genomic positions for a given prefix/suffix.  Works with candidate regions in the ref genome. (keep track of the read location of 12 residues that support each candidate region)
  • 45. 45Functional Genomics, SS2014Dienstag, 25. März 2014 GSNAP Wu and Nacu (2010)
  • 46. 46Functional Genomics, SS2014Dienstag, 25. März 2014 For a more powerful use of the algorithms:  use of available gene annotations, which allow it to avoid erroneously mapping reads to pseudogenes  use the information about the pair sof the paired read
  • 47. 47Functional Genomics, SS2014Dienstag, 25. März 2014 Outline  Challenges in RNA sequence alignment  The aim of this paper  Existing spliced-alignment software  Conclusions
  • 48. 48Functional Genomics, SS2014Dienstag, 25. März 2014 Conclusions  Mismatches and basewise accuracy MapSplice, PASS and TopHat display a low tolerance for mismatches. Consequently, a large proportion of reads with low base-call quality scores were not mapped by these methods
  • 49. 49Functional Genomics, SS2014Dienstag, 25. März 2014 Conclusions  Mismatches and basewise accuracy ● GSNAP, GSTRUCT, MapSplice,PASS, SMALT and STAR allow missmatches an can also output an incomplete alignment when they are unable to map an entire sequence
  • 50. 50Functional Genomics, SS2014Dienstag, 25. März 2014 Conclusions  Mismatches and basewise accuracy Reads from mouse were mapped (against the mouse reference assembly17) at a greater rate and with fewer mismatches than those from K562 (the cancer cell line K562 accumulated a lot of mutations with respect to the human reference assembly).
  • 51. 51Functional Genomics, SS2014Dienstag, 25. März 2014 Conclusions  Indel frequency and accuracy . ● GSTRUCT produced the most uniform distribution of indels (coefficient of variation (CV) = 0.32) ● TopHat produced the most variable distribution (CV = 1.5 and 1.1 splice junctions) Size distribution of indels for the human K562 data set Precision and recall, stratified by indel size GEM and PALMapper output included more indels than any other method
  • 52. 52Functional Genomics, SS2014Dienstag, 25. März 2014 Conclusions  Indel frequency and accuracy ● GEM and PALMapper report many false indels (precision) ● GSNAP and GSTRUCT exhibit high sensitivity for deletions, independent of size (recall) ● TopHat2 protocol is the most sensitive method for long insertions (recall) Precision and recall, stratified by indel size
  • 53. 53Functional Genomics, SS2014Dienstag, 25. März 2014 Conclusions  Spliced alignment ● High accuracy discovery rate for ReadsMap, GSNAP, GSTRUCT and MapSplice and TopHat ● #false junction calls was greatly reduced if junctions were filtered by supporting alignment counts (plot c) ● Protocols using annotation recovered nearly all of the known junctions in expressed transcripts (plot d) ● For novel-junction discovery, GSTRUCT outperformed other methods ●
  • 54. 54Functional Genomics, SS2014Dienstag, 25. März 2014 Conclusions  GSNAP, GSTRUCT, MapSplice and STAR compared favorably to the other methods  MapSplice seems to be a conservative aligner with respect to mismatch frequency, indel and exon junction calls.  The most significant issue with GSNAP, GSTRUCT and STAR is the presence of many false exon junctions in the output.  Both GSNAP and GSTRUCT require considerable computing time when parameterized for sensitive spliced alignment
  • 55. 55Functional Genomics, SS2014Dienstag, 25. März 2014 Thank you!
  • 56. 56Functional Genomics, SS2014Dienstag, 25. März 2014  Remaining challenges:  Remaining challenges include exploiting gene annotation with-  out introducing bias, correctly placing multimapped reads, achiev-  ing optimal yet fast alignment around gaps and mismatches, and  Analysis  reducing the number of false exon junctions reported. Ongoing  developments in sequencing technology will demand efficient  processing of longer reads with higher error rates and will require  more extensive spliced alignment as reads span multiple
  • 57. 57Functional Genomics, SS2014Dienstag, 25. März 2014  Some RNA-seq aligners, including GSNAP [5], RUM [6], and STAR [7], map reads independently of the alignments of other reads, which may explain their lower sensitivity for these spliced reads  GSNAP [5] and STAR [7] also make use of annotation, although they use it in a more limited fashion in order to detect splice sites
  • 58. 58Functional Genomics, SS2014Dienstag, 25. März 2014  have shown how suffix arrays (Manber  and Myers, 1990), compressed using a Burrows-Wheeler Transform  (BWT) (Burrows and Wheeler, 1994), can rapidly map reads that  are exact matches or have a few mismatches or short insertions or  deletions (indels) relative to the reference. 
  • 59. 59Functional Genomics, SS2014Dienstag, 25. März 2014  A third approach, provided by the QPALMA program (Bona  et al., 2008), can align individual reads across exon–exon junctions  using Smith–Waterman-type alignments and a specifically trained  splice site model. 

Notas del editor

  1. /home/monique/Desktop/ETH_alignment_MDragan.odp
  2. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  3. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  4. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  5. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  6. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  7. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  8. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  9. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  10. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps
  11. RNA Splicing Introns - mRNA transcripts do not include these introns, so the alignment program must handle gapped (or spliced) alignment with very large gaps