SlideShare una empresa de Scribd logo
1 de 29
RNA-Seq
RNA-Seq
• Application of Next Generation Sequencing technology
(NGS) for RNA sequencing for transcript identification
and quantification of RNA.
• Can be used for:
– Estimating the number of transcripts in the sample
(transcriptomics or expression profiling)
– Reveal sequence variation
– Detection of alternate splicing
– Gene expression profiles of healthy versus diseased tissue
RNA-Seq vs Microarray
BMC Bioinformatics201415(Suppl 11):S3, DOI: 10.1186/1471-2105-15-S11-S3
Data Generation Steps REVI EWS
Nature Reviews Genetics 12, 671-682 (October 2011) , Doi:10.1038/nrg3068
RNA-Seq analysis Pipeline for Detecting
Differential Expression
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Read-Mapping Challenges
• NGS Computational challenges
• Memory footprint
• Millions of short reads
• RNA-Seq Special Mapping Concerns
• New technology old problems
• Exact vs inexact matches
From wikipedia
Algorithms For Read Mapping
Build an Index
Set of position where reads are most likely to align
Refined alignment at the target locations
- Hash table
- Burrow-Wheeler
transform (BWT); FM
Index
Seed and Extend
Hash Tables
• Use hash tables to store position of all k-mers
in a genome
1 2
012345678901234567890
AATCGCATAG
ATCGCATAGT
TCGCATAGTT
CGCATAGTTA
GCATAGTTA T
- Chr 9, location 0
- Chr 9, location 1
- Chr 9, location 2
- Chr 9, location 3
- Chr 9, location 4
- Chr 9, location 5
AATCGCATAGTTATTAATGCTA
Output String: TTGGAACC
Input String: GCTAGCTA
GCTAGCTA
CTAGCTAG
TAGCTAGC
AGCTAGCT
GCTAGCTA
CTAGCTAG
TAGCTAGC
AGCTAGCT
AGCTAGCT
AGCTAGCT
CTAGCTAG
CTAGCTAG
GCTAGCTA
GCTAGCTA
TAGCTAGC
TAGCTAGC
Sorting
Burrows-Wheeler Transformation
BWT
• Reversible transformation
• Repetitive nature of the
outcome makes it easier to
compress
Seed and Extend
Read Target
ATGCTAGT ATGCTGTT
ATGCTAGT
Mis-match
Match
RNA-Seq: Special Mapping Concerns
www.ensembl.org
RNA-Seq: Special Mapping Concerns
genome.gov
Alternate Splicing
RNA-Seq: Special Mapping Concerns
• For RNA sequencing data, many reads will map to the reference
genome, but many reads will not because (coming from RNA) they
span exon–exon junctions.
• Methods to deal with junction reads
• Align to the reference transcriptome (well annotated).
• Align to the reference genome and build a junction library
from known adjacent exons and then align unmapped reads to
junction library
• Map reads to the genome and identify putative exon (indel
finding algorithm); using these candidate exon build all
possible exon-exon junctions
• De novo assembly of RNA-Seq reads
RNA-Seq: Special Mapping Concerns
Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
Reference Based Mapping Methods
BMC Genomics. 2014; 15(1): 570, Doi: 10.1186/1471-2164-15-570
Tophat2
Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
Transcript Assembly
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct; 10(5): 1234–1240.
RNA-Seq analysis Pipeline for Detecting
Differential Expression
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Summarizing Reads
• Aggregate reads over biological meaningful units such as transcripts or
genes
• Count the number of reads overlapping exons in a gene (but significant
proportion of the reads will also map outside annotated regions
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Count Normalization
• Number of reads aligned to a gene gives a measure of
its level of expression
• Normalization of the count data
• Sequencing depth
• Length bias
o decide
rom the
require-
h assem-
ut differ
ufflinks b
Isoform 1
d
a
Low
Short transcript
High
Long transcript
Readcount
21
43
1 2 3 4
Exon unio104
Nature Methods 8, 469–477 (2011), Doi:10.1038/nmeth.1613
Count Normalization
• RPKM (Reads Per Kilobase of exon model per Million mapped reads)
• FPKM (Fragments Per Kilobase of exon model per Million mapped reads
• TPM (Transcripts per million)
Exon length
Raw number of reads
Number of mapped reads in the sample
1,000,000
RPKM =
Count Normalization
Gene/Transcript Name R1 counts R2 counts
A (50 kb) 37000 70000
B (100 kb) 50000 110000
C (200 kb) 50000 88000
D (-- kb) ---- ----
XDD (-- kb) ---- -----
Total number of reads 2000000 4000000
RPKM Calculation
RNA-Seq analysis Pipeline for Detecting
Differential Expression
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Differential Expression
• Goal of the DE analysis is to identify the genes
for which abundance across different
experimental conditions has changed
significantly
• Biological replicates (to account for biological
variation)
• Ranked list of genes with associated p-values
and fold changes
• DE tools: edgeR, DESeq
Alignment Independent Quantification
• Sailfish
• Salmon
• Kallisto
Main Idea
• Quantify the abundance of known transcripts
• Read mapping is unnecessary
• Replace inexact pattern matching with exact sub-pattern counting
Sailfish
Nature Biotechnology 32, 462–464 (2014), Doi:10.1038/nbt.2862
Transcript: TACGTACTAGACCTAA….....
Read: TGCGTACTAGCCCT
K-mers are Robust to Errors
Kallisto
arXiv:1505.02710v2 [q-bio.QM]

Más contenido relacionado

La actualidad más candente

Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysisRamaJumwal2
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
Single-cell RNA-seq tutorial
Single-cell RNA-seq tutorialSingle-cell RNA-seq tutorial
Single-cell RNA-seq tutorialAaron Diaz
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single CellQIAGEN
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisSANJANA PANDEY
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsAthira RG
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptSilpa87
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure predictionSubin E K
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysestuxette
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2GCUF
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications1010Genome Pte Ltd
 

La actualidad más candente (20)

Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Rna seq
Rna seqRna seq
Rna seq
 
Overview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seqOverview of Single-Cell RNA-seq
Overview of Single-Cell RNA-seq
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Pathway and network analysis
Pathway and network analysisPathway and network analysis
Pathway and network analysis
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
Single-cell RNA-seq tutorial
Single-cell RNA-seq tutorialSingle-cell RNA-seq tutorial
Single-cell RNA-seq tutorial
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.ppt
 
In silico structure prediction
In silico structure predictionIn silico structure prediction
In silico structure prediction
 
A short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analysesA short introduction to single-cell RNA-seq analyses
A short introduction to single-cell RNA-seq analyses
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 

Similar a RNASeq - Analysis Pipeline for Differential Expression

rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfPushpendra83
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataAlireza Doustmohammadi
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014LutzFr
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSGolden Helix Inc
 
Bacterial rna sequencing
Bacterial rna sequencingBacterial rna sequencing
Bacterial rna sequencingDynah Perry
 
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...Raunak Shrestha
 

Similar a RNASeq - Analysis Pipeline for Differential Expression (20)

RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Cufflinks
CufflinksCufflinks
Cufflinks
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2
 
Bacterial rna sequencing
Bacterial rna sequencingBacterial rna sequencing
Bacterial rna sequencing
 
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
 

Último

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 

Último (20)

psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 

RNASeq - Analysis Pipeline for Differential Expression

  • 2. RNA-Seq • Application of Next Generation Sequencing technology (NGS) for RNA sequencing for transcript identification and quantification of RNA. • Can be used for: – Estimating the number of transcripts in the sample (transcriptomics or expression profiling) – Reveal sequence variation – Detection of alternate splicing – Gene expression profiles of healthy versus diseased tissue
  • 3. RNA-Seq vs Microarray BMC Bioinformatics201415(Suppl 11):S3, DOI: 10.1186/1471-2105-15-S11-S3
  • 4. Data Generation Steps REVI EWS Nature Reviews Genetics 12, 671-682 (October 2011) , Doi:10.1038/nrg3068
  • 5. RNA-Seq analysis Pipeline for Detecting Differential Expression Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 6. Read-Mapping Challenges • NGS Computational challenges • Memory footprint • Millions of short reads • RNA-Seq Special Mapping Concerns • New technology old problems • Exact vs inexact matches From wikipedia
  • 7. Algorithms For Read Mapping Build an Index Set of position where reads are most likely to align Refined alignment at the target locations - Hash table - Burrow-Wheeler transform (BWT); FM Index Seed and Extend
  • 8. Hash Tables • Use hash tables to store position of all k-mers in a genome 1 2 012345678901234567890 AATCGCATAG ATCGCATAGT TCGCATAGTT CGCATAGTTA GCATAGTTA T - Chr 9, location 0 - Chr 9, location 1 - Chr 9, location 2 - Chr 9, location 3 - Chr 9, location 4 - Chr 9, location 5 AATCGCATAGTTATTAATGCTA
  • 9. Output String: TTGGAACC Input String: GCTAGCTA GCTAGCTA CTAGCTAG TAGCTAGC AGCTAGCT GCTAGCTA CTAGCTAG TAGCTAGC AGCTAGCT AGCTAGCT AGCTAGCT CTAGCTAG CTAGCTAG GCTAGCTA GCTAGCTA TAGCTAGC TAGCTAGC Sorting Burrows-Wheeler Transformation BWT • Reversible transformation • Repetitive nature of the outcome makes it easier to compress
  • 10. Seed and Extend Read Target ATGCTAGT ATGCTGTT ATGCTAGT Mis-match Match
  • 11. RNA-Seq: Special Mapping Concerns www.ensembl.org
  • 12. RNA-Seq: Special Mapping Concerns genome.gov Alternate Splicing
  • 13. RNA-Seq: Special Mapping Concerns • For RNA sequencing data, many reads will map to the reference genome, but many reads will not because (coming from RNA) they span exon–exon junctions. • Methods to deal with junction reads • Align to the reference transcriptome (well annotated). • Align to the reference genome and build a junction library from known adjacent exons and then align unmapped reads to junction library • Map reads to the genome and identify putative exon (indel finding algorithm); using these candidate exon build all possible exon-exon junctions • De novo assembly of RNA-Seq reads
  • 14. RNA-Seq: Special Mapping Concerns Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
  • 15. Reference Based Mapping Methods BMC Genomics. 2014; 15(1): 570, Doi: 10.1186/1471-2164-15-570
  • 16. Tophat2 Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
  • 17. Transcript Assembly IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct; 10(5): 1234–1240.
  • 18. RNA-Seq analysis Pipeline for Detecting Differential Expression Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 19. Summarizing Reads • Aggregate reads over biological meaningful units such as transcripts or genes • Count the number of reads overlapping exons in a gene (but significant proportion of the reads will also map outside annotated regions Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 20. Count Normalization • Number of reads aligned to a gene gives a measure of its level of expression • Normalization of the count data • Sequencing depth • Length bias o decide rom the require- h assem- ut differ ufflinks b Isoform 1 d a Low Short transcript High Long transcript Readcount 21 43 1 2 3 4 Exon unio104 Nature Methods 8, 469–477 (2011), Doi:10.1038/nmeth.1613
  • 21. Count Normalization • RPKM (Reads Per Kilobase of exon model per Million mapped reads) • FPKM (Fragments Per Kilobase of exon model per Million mapped reads • TPM (Transcripts per million) Exon length Raw number of reads Number of mapped reads in the sample 1,000,000 RPKM =
  • 22. Count Normalization Gene/Transcript Name R1 counts R2 counts A (50 kb) 37000 70000 B (100 kb) 50000 110000 C (200 kb) 50000 88000 D (-- kb) ---- ---- XDD (-- kb) ---- ----- Total number of reads 2000000 4000000
  • 24. RNA-Seq analysis Pipeline for Detecting Differential Expression Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 25. Differential Expression • Goal of the DE analysis is to identify the genes for which abundance across different experimental conditions has changed significantly • Biological replicates (to account for biological variation) • Ranked list of genes with associated p-values and fold changes • DE tools: edgeR, DESeq
  • 26. Alignment Independent Quantification • Sailfish • Salmon • Kallisto Main Idea • Quantify the abundance of known transcripts • Read mapping is unnecessary • Replace inexact pattern matching with exact sub-pattern counting
  • 27. Sailfish Nature Biotechnology 32, 462–464 (2014), Doi:10.1038/nbt.2862

Notas del editor

  1. Accurate maps of transcript start and end site Detect sequence rearrangements and abnormal transcript structures (common in tumours) It reflects the current state of the cell and can reveal pathological mechanism
  2. In the past techniques such as microarray were used to study gene expression. It consists of array of probes whose sequence represents particular regions of the genes to be monitored. But there were several limitations High background levels due to cross hybridization Reliance on prior knowledge about the genome On the other had signal from RNA-Seq data is digital in nature because you get the counts. It has base-pair level resolution and a much higher dynamic range of expression levels. We can find novel transcripts and fusion products.
  3. Extraction of the RNA Remove contaminant DNA If the goal of the experiment is expression profiling then polyA selection for enriching mRNA in eukaryotes, will miss non-coding RNAs and RNAs that miss polyA tails. So if Other library preparation is to deplete rRNA Library preparation can introduce biases such as amplification of GC-rich regions and generation of duplicate sequence
  4. Pattern searching and data compression are old computational problems. Exact matches are very quick but inexact matches(SW algothrim) taking into account the snps/indels are very slow.
  5. First build an index and find the most probable sites where reads can match. Then at these putative sites (narrowed down) do local alignment.
  6. Reads are coming from the mRNA and we are trying to match them to the genome.
  7. Splicing is post-transcriptional modification in which non-coding regions are removed. Many transcripts will share exon
  8. Transcriptomes are incomplete even for well studied species
  9. In the first step of the alignment you can start by aligning reads to either to the reference genome or to the transcriptome. Alinging to the transcriptome is a new feature in tophat2. It improves overall accuracy and sensitivity of the mapping. It also speeds up the analysis as due to smaller size of the transcriptome. Some of the reads will not be mapped because they are coming form unknown transcripts not present in the annotation and there will also be poorly aligned reads. So the next step is to take these unmapped reads and to find novel splice sites. The way tophat2 does it is by splitting the unmapped reads into non-overlapping segments 25 bp long by default and then these segments are aligned against the genome. The maximum intron size is 100 kb by default and that is the window in which we are looking for the match of left and right segments. When that pattern is detected then tophat2 tries to find the most likely location of the splice sites. After detecting the splice juction, tophat2 puts together based on known junction signals (GT-AG, GC-AG and AT-AC).
  10. Overview of RNA-seq analysis. Reads produced by an RNA-seq experiment are aligned to the genome, then clustered into a graph structure that is traversed to recover all possible isoforms at one locus. Lastly, a subset of transcripts is selected and their abundance quantified from the input reads.
  11. Number of reads aligned gives a measure of the level of expression
  12. Cell type specific exon
  13. Let A and B being two RNA-seq experiments under same condtions by that I mean no differentially expressed genes. If experiment A generates twice as many reads as much reads as B, it is likely that counts from the experiment A will be doubled Length bias: expected number of reads mapped on a gene is proportional to both the abundance and length of the isoforms transcribed from the that gene Adjust for the sequencing depth (“Million” part) Adjust for the Gene length (“kilobase” part) Sequencign depth of a sample second experiment generates twice as many reads
  14. Read with errors still has has many ‘good’ k-mers Only k-mers overlapping errors will be discarded or mis-counted