AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
3. RNA sequencing (mRNA-seq or
(mRNA-
RNA-
RNA-seq)
“An experimental protocol that uses
next- generation sequencing
technologies to sequence the RNA
molecules within a biological sample in
an effort to determine the primary
sequence and relative abundance of
each RNA”
4. A typical RNA-seq experiment
RNA-
Library preparation
and
Sequencing
Bioinformatics Analysis
Nature Reviews Genetics, November 2008; doi:10.1038/nrg2484
5. RNA-
RNA-seq Application
• Allele specific expression: prevelance
of transcribed SNPs
• Fusion transcripts: e.g., in cancer
• Abundance estimation: alternative
splicing, RNA-editing, novel
transcripts
• Gene expression profiling
6. Raw sequences (fastq
My Answer: files)
Quality control (QC)
Spliced Read alignment
Transcripts
reconstruction
Differential expression
analysis
Biology
7. Reference
Available ?
Annotated de novo transcriptome
Annotated Genome Assembled/Predicted
assembly
transcriptome
Reads mapping •De novo assembly
Reads mapping
•Reference assisted
Transcripts
Transcripts reconstruction
reconstruction
Summarization
a (by CDS, exon,
gene, splice
junctions )
Tables of
counts (digital
expression)
Biology DE analysis
RNA-
RNA-seq workflows
(GO/Pathways)
8. Raw sequences (fastq
files)
Quality control (QC)
Spliced Read alignment
Transcripts
reconstruction
Differential expression
analysis
Biology
18. Quantification and
normalisation
1. Digital expression or raw
count: number of reads
mapping to a region (exon/
transcript/novel region)
2. Normalize counts* : number
of reads per million reads
per kb
3. Splice junction detection
4. Compare to existing gene
models
Nat Meth 2008 ; DOI:10.1038/NMETH.1226
19. Differential expression
• Normalised gene expression value as RPKM:
– reads per kilobase of exon model per million mapped reads
• Or FPKM:
– fragments per kilobase of exon model per million mapped reads
• Compare RPKM/FPKM across conditions or tissues
Nat Meth DOI:10.1038/NMETH.1226
20. Raw sequences (fastq
files)
Quality control (QC)
Spliced Read alignment
Transcripts
reconstruction
Differential expression
analysis
Biology
21. System Biology: beyond the
list of DE genes
• Ontologies: GO enrichment, Goseq
(R package)
• DAVID (http://david.abcc.ncifcrf.gov)
• Pathway analysis
22. RNA-
RNA-seq experiment design
challenges
• NGS biases:
– Libraryprep (GC content, 5’ or 3’
depletion, random hexamer primers,
RNA species, bias towards 3’ end …).
– Transcript length
• Sequencing depth
• Single or paired end
• Biological or technical replicates
• Validation BRIEFINGS IN BIOINFORMATICS. VOL 12. NO 3. 280^287
23. RNA-
RNA-seq and other
transcriptomics methods
Nature Reviews Genetics, November 2008; doi:10.1038/nrg2484
24. Summary
• RNA-seq: more versatile, comprehensive with
superior reproducibility and resolution.
• Not dependent on prior sequence information:
suitable for non-model organisms.
• Potentially provides information for all RNA
species in the cell and allows discovery of novel
ones.
• Still an actively developing fields and there are
research areas which still need refinement.
• Experimental design and validation gold
standards to be set.
25. Tophat Cufflinks pipeline reference
Differential gene and transcript expression
analysis of RNA-seq experiments with
TopHat and Cufflinks. Nat Protoc 7(3), 562-
78. [article]
26. Differential gene and transcript expression
analysis of RNA-seq experiments with
TopHat and Cufflinks. Nat Protoc 7(3), 562-
78. [article]