SlideShare una empresa de Scribd logo
1 de 75
Descargar para leer sin conexión
Defining the goal of RNA-seq
analysis for differential
expression
Joachim Jacob
20 and 27 January 2014

This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to
http://www.bits.vib.be/ if you use this presentation or parts hereof.
Great power comes with great responsibility
RNA-seq enables one to
1) get an idea which are all active genes
2) quantify expression of each transcript
3) quantify alternative splicing
… (use your imagination)
Principles of transcriptome analysis and gene expression quantification: an RNA-seq
tutorial. http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12109/abstract
Great power comes with great responsibility

You can't do all
RNA-seq is powerful, we have
to aim for a certain goal.
Our goal is to detect
differential expression
on the gene level.
Differential expression: useful?
What are we looking for?
Explanations of observed phenotypes
GDA
yeast

Yeast mutant

GDA + vit C

why?
The central dogma
causes the phenotypic differences
GDA
yeast

Yeast mutant

GDA + vit C

?
The central dogma
Difference in protein activity
causes the phenotypic differences
GDA
yeast

Yeast mutant

GDA + vit C

?
The central dogma
Presence/concentration of proteins in a cell
causes the phenotypic differences
GDA
yeast

Yeast mutant

GDA + vit C

?
The central dogma
Level of protein production
causes the phenotypic differences
GDA
yeast

Yeast mutant

GDA + vit C

?
The central dogma
Level of templates for protein production
causes the phenotypic differences
GDA
yeast

Yeast mutant

GDA + vit C

?
The central dogma
Level of mRNA copies
causes the phenotypic differences
GDA
yeast

Yeast mutant

GDA + vit C

?
Does it hold?
Level of mRNA copies
Level of templates for protein production
Level of protein production
Presence/concentration of proteins in a cell
Difference in protein activity
Phenotype
Problem reduction
We can measure mRNA levels (much easier
than protein levels).
So we measure mRNA.
The level of mRNA is a proxy of the level of
protein activity causing the aberrant
phenotype.
How to measure mRNA
1. Q-PCR (real-time)
A lot of work to measure few
genes, in a relatively wide array
of tissues. Very accurate.

2. Microarray
Easier way to measure many
predefined genes in a relatively
wide array of tissues. Robust.

3. RNA-seq
RNA-seq protocol in a nut shell
●

Get your sample

●

Lyse the cells and extract RNA

●

Convert the RNA to cDNA

●

The cDNA pool get sequenced

Yeast sample

The result is sequence information from
scratch. No prior information is needed.
Comprehensive comparative analysis of strand-specific RNA sequencing methods
http://www.nature.com/nmeth/journal/v7/n9/full/nmeth.1491.html
Comparative analysis of RNA sequencing methods for degraded or low-input samples
http://www.nature.com/nmeth/journal/v10/n7/full/nmeth.2483.html
The predecessors of RNA-seq
●

●

ESTs: expressed
sequence tags, ideal for
discovery of new genes.
SAGE: serial analysis of
gene expression,
measurement of
number of copies of
mRNA
http://www.montana.edu/observatory/people/mcdermottlab.html
The predecessors of RNA-seq
●

●

ESTs: expressed
sequence tags, ideal for
discovery of new genes.
SAGE: serial analysis of
gene expression,
measurement of
number of copies of
mRNA
http://www.sagenet.org/findings/index.html
The predecessors of RNA-seq
●

ESTs: expressed sequence tags

●

SAGE: serial analysis of gene expression

Low throughput: long sequence
information, but for only ~thousands of
genes.
Concept of measuring with RNA-seq

One template of protein production

GeneA GeneB GeneC

Extract mRNA
and turn into
cDNA

Fragment, ligate
adaptor, amplify.

Put a fraction of
the pool on
sequencer to
read fragments.

Figure: All things must pass: contrasts and commonalities in eukaryotic and bacterial mRNA decay, Nature Reviews Molecular Cell Biology 11, 467–478
RNA-seq protocol in a nut shell

Yeast sample
So many steps must fail our assumption
Phenotype
Define the phenotype

Proteins
Are a proxy for protein activity

mRNA levels
Represent the RNA pool we've extracted

cDNA pool
Represent the cDNA pool we've created

RNA-seq reads
So many steps must fail our assumption
Phenotype
mRNA templates have
different speeds of protein proDuction: availability of tRNAs,
rate of mRNA degration,
Alternative splicing events,...

Proteins
mRNA levels
cDNA pool

Fail to map reads to correct
gene, lane-specific biases on
reading cDNA fragments,...

Protein activity is regulated:
Fosforylation, ubiquitination,...

Loss on RNA extraction, 90% of
RNA in cell is rRNA, ligation
of adapters, conversion to cDNA
not 100%

RNA-seq reads
Consequence: focus on comparison
Phenotype A
Proteins

Phenotype B
Possibly due
to differences in
expression

Proteins

mRNA levels

mRNA levels

cDNA pool

cDNA pool

RNA-seq reads

RNA-seq reads
Consequence: focus on comparison
Phenotype A

Phenotype B

Proteins

Proteins

mRNA levels

mRNA levels

cDNA pool

cDNA pool

RNA-seq reads

RNA-seq reads

DESIGN OF
EXPERIMENT
Comparing number of reads to genes
sample
RNA-seq

GeneA GeneB GeneC

Obviously, the number of reads is dependent on:
OUR QUESTION
1. the expression level of the gene
2. the total number of reads generated
3. the length of the transcript
Normalisation is needed!
Experimental design
Our focus: which genes are differentially expressed
between different conditions?
Obviously, the number of reads is dependent on:
1. the expression level of the gene
2. the total number of reads generated
3. the length of the transcript
How many reads to sequence?
Which normalisation is needed?
Experimental design
Our focus: which genes are differentially expressed
between different conditions?
“How can we detect genes for which the counts of
reads change between conditions more
systematically than as expected by chance”
We must design an experiment in which we can test
this deviance from chance.

Oshlack et al. 2010. From RNA-seq reads to differential expression results. Genome Biology 2010,
11:220 http://genomebiology.com/2010/11/12/220
How many reads to sequence?
In other words: how deep to sequence? What is the
required 'depth of sequencing'?
sample
RNA-seq

GeneA GeneB

GeneC

GeneA GeneB

GeneC

sample
RNA-seq

The final test will look at ratios:

6

5

3

5

6

4

1,2

0,83

0,75
How many reads to sequence?
The difference between the lowest gene count and
the highest gene count is typically 105. This is called
the dynamic range.

Linear scale is useless.

The logarithmic scale is better.
Wait! Something's not correct here!
Zero remains zero!
We are working with counts. A count is >=1. A gene
with zero counts can be not yet sequenced (not
deep enough) or is not expressed in that condition.

0
It is not a full logarithmic scale.
It starts at zero.
So keep all counts above zero?
Assuming equal sequencing depth in the samples,
and these counts. Do all these genes differ in
expression?
sample
sample
RATIO
GeneA

5

10

2

GeneB

15

30

2

GeneC

40

80

2

GeneD

100

200

2

GeneE

1000

2000

2

GeneZ

1

2

2
So keep everything above zero?
Sequencing the result of the same steps again is
called a technical replicate.
Is there a trend in how
these numbers change? sample

sample

RATIO
GeneA

11

10

0,91

GeneB

11

30

2,72

GeneC

60

80

1,33

GeneD

79

200

2,53

GeneE

1150

2000

1,74

GeneZ

5

1

0,20

2?
Technical replicates
We take the same cDNA pool and sequence it several
times: technical replicates.
sample

sample

sample

sample

GeneA

11

5

4

4

GeneB

11

16

14

8

GeneC

60

45

32

38

GeneD

79

102

95

110

GeneE

1150

1023

987

1005

GeneZ

3

0

0

1
The poisson distribution
The counts of technical replicates follow a poisson
distribution (Marioni et al 2008). The Poisson distribution
can be applied to systems with a large number of possible
events, each of which is rare.
From Wikipedia. Can be 3
different genes, each with
their own poisson
distribution. Lambda is
the mean of the gene's
distribution, with a
certain number of reads.
Y=axis: chance to pick
that number of reads.
The poisson distribution
So when we have 4 technical replicates sequenced up
to a big depth (say 10 M reads). We can get by
chance, these numbers for 3 different genes.

GeneA 0, 0, 1, 3
GeneB 2, 3, 4, 7
GeneC 8, 9, 11, 14
Working the intuition
How many blue balls?
How many red balls?
Draw 10
Draw 10 more
Draw 10 more
Estimate how large the fraction is in the set?
The intuition with the balls
Color
Blue
Red
No color

10 draws

20 draws

30 draws

40 draws
Conclusion of the experiment
How bigger the fraction in the pool, how quicker (i.e.
with less sequencing depth) we are certain about the
estimate of that fraction.
estimate=count; variance=count

For lower counts, the variance is
relatively bigger than the
variance for higher counts.
CV (coëfficient of variation) =
sqrt(count)/count
Genes with lower expression
need much deeper sequencing
than genes with higher
expression levels.
Comparing counts
“Here we show the overlap of Poisson
distributions of single measurements at
different read counts. Because relative
Poisson uncertainty is high at low read
counts, a count of 1 versus 2 has very
little power to discriminate a true 2X fold
change, though at higher counts a 2X fold
change becomes significant.
In an actual experiment, the width of the
distribution would be greater due to
additional biological and technical
uncertainty, but the uncertainty to the
mean expression would narrow with
each additional replicate.”

Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression.
Bioinformatics (2013) doi: 10.1093/bioinformatics/btt015
(Log2 of the counts)

Comparing technical replicates
Correlation
between mean
and variance
according to Poisson

Lowess fit through
the data

(Log2 of the counts)

Risso et al. “GC-Content Normalization for RNA-Seq Data”
BMC Bioinformatics 2011, 12:480
http://www.biomedcentral.com/1471-2105/12/480 - EDASeq package (R)
But poisson does not seem to fit
Extending the samples to real biological samples, this
mean variance relationship does not hold...

Plotted using EDASeq
Package in R.
But poisson does not seem to fit
Extending the samples to real biological samples, this
mean variance relationship does not hold!

Something is going on!

Reasonable fit
Plotted using EDASeq
Package in R.
An extra source of variation
The Poisson distribution has an 'overdispersed'
variance: the variance is bigger than expected for
higher counts between biological replicates.
Something is going on!

Plotted using EDASeq
Package in R.
An extra source of variation
Where Poisson: CV = std dev / mean => CV² = 1/μ
If an additional distribution is involved (also
dependent on π, the fraction of the gene in the cDNA
pool), we have a
mixture of
distributions:
CV² = 1/μ + φ
Low counts!

dispersion

Generalization of Poisson
with this extra parameter:
the Negative Binomial
Model fits better!
The negative binomial model
The NB model fits observed expression data of
RNA-seq better. It is a generalization of Poisson, and
2 parameters need to be estimated (μ and φ)
Counts (gene g in sample j) has a
Mean = μgj
Variance = μgj + φg μgj²
Biological CV² = φg => Biological CV = √φg
Methods differ in estimating this dispersion per gene:
Can only be measured with true biological replicates
Variation summary, intuitively
Total CV² = Technical CV² + Biological CV²
For low counts, the Poisson (technical) variation or
the measurement error is dominant.
For higher counts, the Poisson variation gets smaller,
and another source of variation becomes dominant,
the dispersion or the biological variation. Biological
variation does not get smaller with higher counts.
Beyond the NB model
It appears from analysis of many
biological replicates (#=69) that not
every gene can be modeled as NB:
the Poisson-Tweedie model
provides a further generalisation
and a better fit for many genes
(with an additional shape
parameter).
Left figure: raw data shows that about 26% of
the genes fit a NB model. Depending on the
estimated shape parameter, other
distributions fit better.

Esnaola et al. BMC Bioinformatics 2013, 14:254
http://www.biomedcentral.com/1471-2105/14/254
Consequence for our design
For low counts: the uncertainty is big due to
Poisson
●

For high counts: the uncertainty is big due to
biological variation. (highly expressed genes differ
in their natural variation (regulated by cellular
processes) more than lowly expressed genes).
●

If we focus on the ratios between the conditions:
is it reasonable to set a restriction of fold change?
Highly expressed genes can have a smaller and be
significant. Lowly expressed genes can exceed 2.
●
Consequence on fold change
The readily applied cut-off in micro-array analysis
is in RNA-seq not of use.
Volcanoplot
Blue and red:
known DE genes

These cut-offs often
applied can prohibit
detecting DE genes
Long story to say...
We need to estimate the model behind the count.
Never work without biological replicates.
Never work with 2 biological replicates.
Try avoiding working with 3 biological replicates.
Go for at least 4 biological replicates.
Break?
Overview
Sample 1
RNA-seq
Condition X

GeneA GeneB

GeneC

GeneA GeneB

GeneC

GeneA GeneB

GeneC

GeneA GeneB

GeneC

GeneA GeneB

GeneC

GeneA GeneB

GeneC

Sample 2
RNA-seq
Sample 3
RNA-seq
Sample 4
RNA-seq

Condition Y

Sample 5
RNA-seq
Sample 6
RNA-seq
Summary
Obviously, the number of reads is dependent on:
1. chance
→ Define the count model (NB) from replicates
2. the expression level of the gene
→ Compare the ratios with a test
2. the total number of reads generated
3. the length of the transcript
The total number of reads generated
sample
RNA-seq

GeneA GeneB GeneC

sample
More RNA-seq

GeneA GeneB GeneC

The number of reads is dependent on the total
number of reads generated. If one library is
sequenced to 20M reads, and another one to
40M, most genes will ~double their counts.
Normalization for library size
Naive approach: divide by total library size. Is not
applied anymore!
Why not? Composition matters!

2 things to remember:
- zero sum system (or “we cannot count what we can't sequence”)
- 5 orders of magnitude
Normalization for library size
2 things to remember:
- zero sum system
- 5 orders of magnitude
In every sample, a lot of
reads are spend on few
extremely highly expressed
genes. Which genes? That
differ between libraries, but
affects negatively the naïve
size normalization if we
include those genes.
Normalization for library size
Schematically: when normalized on library size
(square represent number of reads).
Few genes with enormous counts: there is NO SATURATION of these counts

Rest of the genes

All counts for library A

Rest of the genes
All counts for library B
Normalization for library size
Better normalization would be as shown below.
DESeq2 and EdgeR apply such an approach (see
100%
later).
100%

Rest of the genes

Rest of the genes
Gene length influence the count
“Longer transcripts generate more reads”
True! But the transcript length does not differ
between samples. Since we are concerned with
relative differences between samples, this needs
no normalization (this story changes in case of
absolute quantification).
Sample A

Sample B

Gene A

Gene A

Gene B

Gene B
Between sample variation
Properties of libraries/samples can effect the
counts, and lead to variation. This is called
between-lane variation. Obvious ones: library
size (how many reads are sampled), library
composition.
Different libraries/samples can exhibit increased
variation by differing in how gene properties
relate to gene counts. This is called within-lane
variation.
GC-content of genes can influence counts
GC-content differs between genes. But it does
not change between samples, so there should
be no problem for relative expression
comparison.
We can visualize the
relationship between
counts and GC very
easily (see right). There is
some trend, and it is
equal for all samples.
EDAseq (R)
GC-content of genes can influence counts
Sometimes, samples show different relationships
between GC-content of the genes and the counts.
This within-lane variation
(or intra-sample) variation
needs to be corrected for,
so that in one sample not
all differentially expressed
genes are also the
GC-riched ones.
Length can have also this
effect.
What we need to know for our set-up
We want to detect differentially expressed genes
between 2 or more conditions.
For this, we need to apply the conditions in a
controlled environment (randomisation,...).
For good testing, we need to have some biological
replicates per condition.
For cost effectiveness, we determine how deep we
will sequence from each sample.
We analyse the reads, get raw counts and do the test!
Library preparation and lane loading

HiSeq2000: 24 single-index barcodes available. 1
lane gives 150-180 M reads. One lane of 50 bp SE
approx €1.500.
Bioinformatics analysis will take most of your time
Biological insight
DE test

Quality control (QC) of raw reads

QC of the count table

Count table extraction
Preprocessing: filtering of reads
and read parts, to help our goal
of differential detection.

QC of preprocessing

QC of the mapping

Mapping to a reference genome
(alternative: to a transcriptome)
Bioinformatics analysis will take most of your time
Biological insight
DE test

Quality control (QC) of raw reads

QC of the count table

Count table extraction
Preprocessing: filtering of reads
and read parts, to help our goal
of differential detection.

QC of preprocessing

QC of the mapping

Mapping to a reference genome
(alternative: to a transcriptome)
Bioinformatics analysis will take most of your time
Biological insight

6

1
DE test

Quality control (QC) of raw reads

5

QC of the count table
4
Count table extraction

Preprocessing: filtering of reads
2
and read parts, to help our goal
of differential detection.

QC of the mapping
3

QC of preprocessing

Mapping to a reference genome
(alternative: to a transcriptome)
Overview

http://www.nature.com/nprot/journal/v8/n9/full/nprot.2013.099.html
The numbers get reduced with every step
25M

20M

15M
Deeper, or more replicates?
Variance will be lower with more reads: but
sequencing another biological replicate is
preferred over sequencing deeper, or technical reps.

Doi: 10.1093/bioinformatics/btt015
There is tool to help you set up
Scotty – power analysis
Power: the probability to reject the null hypothesis if the alternative is
true.
'How many samples and how deep in order to minimize false
negatives'.
(a null hypothesis is always a scenario in which there is no difference,
hence no differential expression).
Alternative tools:

http://wiki.bits.vib.be/index.php/RNAseq_toolbox
Help with design

http://wiki.bits.vib.be/index.php/RNAseq_toolbox
How many samples to sequence?
→ Scotty exercise
Keywords
A read count of a gene is dependent on:
1. chance
2. expression level
3. transcript length
4. depth of sequencing
5. GC-content
Poisson distribution
Negative binomial distribution
Condition
Sample
Normalization
Write in your own words what the terms mean
Reads

All my references available at:
https://www.zotero.org/groups/dernaseq/items

Más contenido relacionado

La actualidad más candente

Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
Comparitive genome mapping and model systems
Comparitive genome mapping and model systemsComparitive genome mapping and model systems
Comparitive genome mapping and model systemsHimanshi Chauhan
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGAayushi Pal
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaBhavya Sree
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingPALANIANANTH.S
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Roche Pyrosequencing 454 ; Next generation DNA Sequencing
Roche Pyrosequencing 454 ; Next generation DNA SequencingRoche Pyrosequencing 454 ; Next generation DNA Sequencing
Roche Pyrosequencing 454 ; Next generation DNA SequencingAbhay jha
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisDespoina Kalfakakou
 
Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Robert Bruce
 
New generation sequencing equipments
New generation sequencing equipmentsNew generation sequencing equipments
New generation sequencing equipmentsKalaivani P
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Mrinal Vashisth
 
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUEPacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUEMuunda Mudenda
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisEfi Athieniti
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networksMadiheh
 

La actualidad más candente (20)

Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Comparitive genome mapping and model systems
Comparitive genome mapping and model systemsComparitive genome mapping and model systems
Comparitive genome mapping and model systems
 
NEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCINGNEXT GENERATION SEQUENCING
NEXT GENERATION SEQUENCING
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thaliana
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Illumina sequencing introduction
Illumina sequencing introductionIllumina sequencing introduction
Illumina sequencing introduction
 
SNP Genotyping Technologies
SNP Genotyping TechnologiesSNP Genotyping Technologies
SNP Genotyping Technologies
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Sanger sequencing
Sanger sequencingSanger sequencing
Sanger sequencing
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Roche Pyrosequencing 454 ; Next generation DNA Sequencing
Roche Pyrosequencing 454 ; Next generation DNA SequencingRoche Pyrosequencing 454 ; Next generation DNA Sequencing
Roche Pyrosequencing 454 ; Next generation DNA Sequencing
 
Bioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysisBioinformatics tools for NGS data analysis
Bioinformatics tools for NGS data analysis
 
Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.
 
New generation sequencing equipments
New generation sequencing equipmentsNew generation sequencing equipments
New generation sequencing equipments
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUEPacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
PacBio SMRT - THIRD GENERATION SEQUENCING TECHNIQUE
 
Whole Genome Sequencing Analysis
Whole Genome Sequencing AnalysisWhole Genome Sequencing Analysis
Whole Genome Sequencing Analysis
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networks
 

Destacado

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalJoachim Jacob
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsBITS
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsBITS
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCJoachim Jacob
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionJoachim Jacob
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 

Destacado (20)

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Part 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goalPart 1 of RNA-seq for DE analysis: Defining the goal
Part 1 of RNA-seq for DE analysis: Defining the goal
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome level
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QCPart 4 of RNA-seq for DE analysis: Extracting count table and QC
Part 4 of RNA-seq for DE analysis: Extracting count table and QC
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expression
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 

Similar a RNA-seq: general concept, goal and experimental design - part 1

RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_PresentationToyin23
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities Paolo Dametto
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing ResearchTanmay Ghai
 
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome AmplificationEnabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome AmplificationQIAGEN
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...fruitbreedomics
 
Microarray @ujjwal sirohi
Microarray @ujjwal sirohiMicroarray @ujjwal sirohi
Microarray @ujjwal sirohiujjwal sirohi
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
R Gene Expression and Transcription Profiling
R Gene Expression and Transcription ProfilingR Gene Expression and Transcription Profiling
R Gene Expression and Transcription ProfilingShraddha Karcho
 

Similar a RNA-seq: general concept, goal and experimental design - part 1 (20)

RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
RNA Sequencing Research
RNA Sequencing ResearchRNA Sequencing Research
RNA Sequencing Research
 
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome AmplificationEnabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
Enabling RNA-Seq With Limited RNA Using Whole Transcriptome Amplification
 
31931 31941
31931 3194131931 31941
31931 31941
 
EiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.DEiB Seminar from Antoni Miñarro, Ph.D
EiB Seminar from Antoni Miñarro, Ph.D
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
Fruit breedomics workshop wp6 from marker assisted breeding to genomics assis...
 
Microarray @ujjwal sirohi
Microarray @ujjwal sirohiMicroarray @ujjwal sirohi
Microarray @ujjwal sirohi
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Gene Array Analyzer
Gene Array AnalyzerGene Array Analyzer
Gene Array Analyzer
 
Analysis of gene expression
Analysis of gene expressionAnalysis of gene expression
Analysis of gene expression
 
R Gene Expression and Transcription Profiling
R Gene Expression and Transcription ProfilingR Gene Expression and Transcription Profiling
R Gene Expression and Transcription Profiling
 

Más de BITS

Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsBITS
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsBITS
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl courseBITS
 
Basics statistics
Basics statistics Basics statistics
Basics statistics BITS
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networksBITS
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksBITS
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics courseBITS
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...BITS
 

Más de BITS (19)

Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generation
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Basics statistics
Basics statistics Basics statistics
Basics statistics
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

RNA-seq: general concept, goal and experimental design - part 1

  • 1. Defining the goal of RNA-seq analysis for differential expression Joachim Jacob 20 and 27 January 2014 This presentation is available under the Creative Commons Attribution-ShareAlike 3.0 Unported License. Please refer to http://www.bits.vib.be/ if you use this presentation or parts hereof.
  • 2. Great power comes with great responsibility RNA-seq enables one to 1) get an idea which are all active genes 2) quantify expression of each transcript 3) quantify alternative splicing … (use your imagination) Principles of transcriptome analysis and gene expression quantification: an RNA-seq tutorial. http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12109/abstract
  • 3. Great power comes with great responsibility You can't do all RNA-seq is powerful, we have to aim for a certain goal. Our goal is to detect differential expression on the gene level.
  • 4. Differential expression: useful? What are we looking for? Explanations of observed phenotypes GDA yeast Yeast mutant GDA + vit C why?
  • 5. The central dogma causes the phenotypic differences GDA yeast Yeast mutant GDA + vit C ?
  • 6. The central dogma Difference in protein activity causes the phenotypic differences GDA yeast Yeast mutant GDA + vit C ?
  • 7. The central dogma Presence/concentration of proteins in a cell causes the phenotypic differences GDA yeast Yeast mutant GDA + vit C ?
  • 8. The central dogma Level of protein production causes the phenotypic differences GDA yeast Yeast mutant GDA + vit C ?
  • 9. The central dogma Level of templates for protein production causes the phenotypic differences GDA yeast Yeast mutant GDA + vit C ?
  • 10. The central dogma Level of mRNA copies causes the phenotypic differences GDA yeast Yeast mutant GDA + vit C ?
  • 11. Does it hold? Level of mRNA copies Level of templates for protein production Level of protein production Presence/concentration of proteins in a cell Difference in protein activity Phenotype
  • 12. Problem reduction We can measure mRNA levels (much easier than protein levels). So we measure mRNA. The level of mRNA is a proxy of the level of protein activity causing the aberrant phenotype.
  • 13. How to measure mRNA 1. Q-PCR (real-time) A lot of work to measure few genes, in a relatively wide array of tissues. Very accurate. 2. Microarray Easier way to measure many predefined genes in a relatively wide array of tissues. Robust. 3. RNA-seq
  • 14. RNA-seq protocol in a nut shell ● Get your sample ● Lyse the cells and extract RNA ● Convert the RNA to cDNA ● The cDNA pool get sequenced Yeast sample The result is sequence information from scratch. No prior information is needed. Comprehensive comparative analysis of strand-specific RNA sequencing methods http://www.nature.com/nmeth/journal/v7/n9/full/nmeth.1491.html Comparative analysis of RNA sequencing methods for degraded or low-input samples http://www.nature.com/nmeth/journal/v10/n7/full/nmeth.2483.html
  • 15. The predecessors of RNA-seq ● ● ESTs: expressed sequence tags, ideal for discovery of new genes. SAGE: serial analysis of gene expression, measurement of number of copies of mRNA http://www.montana.edu/observatory/people/mcdermottlab.html
  • 16. The predecessors of RNA-seq ● ● ESTs: expressed sequence tags, ideal for discovery of new genes. SAGE: serial analysis of gene expression, measurement of number of copies of mRNA http://www.sagenet.org/findings/index.html
  • 17. The predecessors of RNA-seq ● ESTs: expressed sequence tags ● SAGE: serial analysis of gene expression Low throughput: long sequence information, but for only ~thousands of genes.
  • 18. Concept of measuring with RNA-seq One template of protein production GeneA GeneB GeneC Extract mRNA and turn into cDNA Fragment, ligate adaptor, amplify. Put a fraction of the pool on sequencer to read fragments. Figure: All things must pass: contrasts and commonalities in eukaryotic and bacterial mRNA decay, Nature Reviews Molecular Cell Biology 11, 467–478
  • 19. RNA-seq protocol in a nut shell Yeast sample
  • 20. So many steps must fail our assumption Phenotype Define the phenotype Proteins Are a proxy for protein activity mRNA levels Represent the RNA pool we've extracted cDNA pool Represent the cDNA pool we've created RNA-seq reads
  • 21. So many steps must fail our assumption Phenotype mRNA templates have different speeds of protein proDuction: availability of tRNAs, rate of mRNA degration, Alternative splicing events,... Proteins mRNA levels cDNA pool Fail to map reads to correct gene, lane-specific biases on reading cDNA fragments,... Protein activity is regulated: Fosforylation, ubiquitination,... Loss on RNA extraction, 90% of RNA in cell is rRNA, ligation of adapters, conversion to cDNA not 100% RNA-seq reads
  • 22. Consequence: focus on comparison Phenotype A Proteins Phenotype B Possibly due to differences in expression Proteins mRNA levels mRNA levels cDNA pool cDNA pool RNA-seq reads RNA-seq reads
  • 23. Consequence: focus on comparison Phenotype A Phenotype B Proteins Proteins mRNA levels mRNA levels cDNA pool cDNA pool RNA-seq reads RNA-seq reads DESIGN OF EXPERIMENT
  • 24. Comparing number of reads to genes sample RNA-seq GeneA GeneB GeneC Obviously, the number of reads is dependent on: OUR QUESTION 1. the expression level of the gene 2. the total number of reads generated 3. the length of the transcript Normalisation is needed!
  • 25. Experimental design Our focus: which genes are differentially expressed between different conditions? Obviously, the number of reads is dependent on: 1. the expression level of the gene 2. the total number of reads generated 3. the length of the transcript How many reads to sequence? Which normalisation is needed?
  • 26. Experimental design Our focus: which genes are differentially expressed between different conditions? “How can we detect genes for which the counts of reads change between conditions more systematically than as expected by chance” We must design an experiment in which we can test this deviance from chance. Oshlack et al. 2010. From RNA-seq reads to differential expression results. Genome Biology 2010, 11:220 http://genomebiology.com/2010/11/12/220
  • 27. How many reads to sequence? In other words: how deep to sequence? What is the required 'depth of sequencing'? sample RNA-seq GeneA GeneB GeneC GeneA GeneB GeneC sample RNA-seq The final test will look at ratios: 6 5 3 5 6 4 1,2 0,83 0,75
  • 28. How many reads to sequence? The difference between the lowest gene count and the highest gene count is typically 105. This is called the dynamic range. Linear scale is useless. The logarithmic scale is better. Wait! Something's not correct here!
  • 29. Zero remains zero! We are working with counts. A count is >=1. A gene with zero counts can be not yet sequenced (not deep enough) or is not expressed in that condition. 0 It is not a full logarithmic scale. It starts at zero.
  • 30. So keep all counts above zero? Assuming equal sequencing depth in the samples, and these counts. Do all these genes differ in expression? sample sample RATIO GeneA 5 10 2 GeneB 15 30 2 GeneC 40 80 2 GeneD 100 200 2 GeneE 1000 2000 2 GeneZ 1 2 2
  • 31. So keep everything above zero? Sequencing the result of the same steps again is called a technical replicate. Is there a trend in how these numbers change? sample sample RATIO GeneA 11 10 0,91 GeneB 11 30 2,72 GeneC 60 80 1,33 GeneD 79 200 2,53 GeneE 1150 2000 1,74 GeneZ 5 1 0,20 2?
  • 32. Technical replicates We take the same cDNA pool and sequence it several times: technical replicates. sample sample sample sample GeneA 11 5 4 4 GeneB 11 16 14 8 GeneC 60 45 32 38 GeneD 79 102 95 110 GeneE 1150 1023 987 1005 GeneZ 3 0 0 1
  • 33. The poisson distribution The counts of technical replicates follow a poisson distribution (Marioni et al 2008). The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. From Wikipedia. Can be 3 different genes, each with their own poisson distribution. Lambda is the mean of the gene's distribution, with a certain number of reads. Y=axis: chance to pick that number of reads.
  • 34. The poisson distribution So when we have 4 technical replicates sequenced up to a big depth (say 10 M reads). We can get by chance, these numbers for 3 different genes. GeneA 0, 0, 1, 3 GeneB 2, 3, 4, 7 GeneC 8, 9, 11, 14
  • 35. Working the intuition How many blue balls? How many red balls? Draw 10 Draw 10 more Draw 10 more Estimate how large the fraction is in the set?
  • 36. The intuition with the balls Color Blue Red No color 10 draws 20 draws 30 draws 40 draws
  • 37. Conclusion of the experiment How bigger the fraction in the pool, how quicker (i.e. with less sequencing depth) we are certain about the estimate of that fraction. estimate=count; variance=count For lower counts, the variance is relatively bigger than the variance for higher counts. CV (coëfficient of variation) = sqrt(count)/count Genes with lower expression need much deeper sequencing than genes with higher expression levels.
  • 38. Comparing counts “Here we show the overlap of Poisson distributions of single measurements at different read counts. Because relative Poisson uncertainty is high at low read counts, a count of 1 versus 2 has very little power to discriminate a true 2X fold change, though at higher counts a 2X fold change becomes significant. In an actual experiment, the width of the distribution would be greater due to additional biological and technical uncertainty, but the uncertainty to the mean expression would narrow with each additional replicate.” Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression. Bioinformatics (2013) doi: 10.1093/bioinformatics/btt015
  • 39. (Log2 of the counts) Comparing technical replicates Correlation between mean and variance according to Poisson Lowess fit through the data (Log2 of the counts) Risso et al. “GC-Content Normalization for RNA-Seq Data” BMC Bioinformatics 2011, 12:480 http://www.biomedcentral.com/1471-2105/12/480 - EDASeq package (R)
  • 40. But poisson does not seem to fit Extending the samples to real biological samples, this mean variance relationship does not hold... Plotted using EDASeq Package in R.
  • 41. But poisson does not seem to fit Extending the samples to real biological samples, this mean variance relationship does not hold! Something is going on! Reasonable fit Plotted using EDASeq Package in R.
  • 42. An extra source of variation The Poisson distribution has an 'overdispersed' variance: the variance is bigger than expected for higher counts between biological replicates. Something is going on! Plotted using EDASeq Package in R.
  • 43. An extra source of variation Where Poisson: CV = std dev / mean => CV² = 1/μ If an additional distribution is involved (also dependent on π, the fraction of the gene in the cDNA pool), we have a mixture of distributions: CV² = 1/μ + φ Low counts! dispersion Generalization of Poisson with this extra parameter: the Negative Binomial Model fits better!
  • 44. The negative binomial model The NB model fits observed expression data of RNA-seq better. It is a generalization of Poisson, and 2 parameters need to be estimated (μ and φ) Counts (gene g in sample j) has a Mean = μgj Variance = μgj + φg μgj² Biological CV² = φg => Biological CV = √φg Methods differ in estimating this dispersion per gene: Can only be measured with true biological replicates
  • 45. Variation summary, intuitively Total CV² = Technical CV² + Biological CV² For low counts, the Poisson (technical) variation or the measurement error is dominant. For higher counts, the Poisson variation gets smaller, and another source of variation becomes dominant, the dispersion or the biological variation. Biological variation does not get smaller with higher counts.
  • 46. Beyond the NB model It appears from analysis of many biological replicates (#=69) that not every gene can be modeled as NB: the Poisson-Tweedie model provides a further generalisation and a better fit for many genes (with an additional shape parameter). Left figure: raw data shows that about 26% of the genes fit a NB model. Depending on the estimated shape parameter, other distributions fit better. Esnaola et al. BMC Bioinformatics 2013, 14:254 http://www.biomedcentral.com/1471-2105/14/254
  • 47. Consequence for our design For low counts: the uncertainty is big due to Poisson ● For high counts: the uncertainty is big due to biological variation. (highly expressed genes differ in their natural variation (regulated by cellular processes) more than lowly expressed genes). ● If we focus on the ratios between the conditions: is it reasonable to set a restriction of fold change? Highly expressed genes can have a smaller and be significant. Lowly expressed genes can exceed 2. ●
  • 48. Consequence on fold change The readily applied cut-off in micro-array analysis is in RNA-seq not of use. Volcanoplot Blue and red: known DE genes These cut-offs often applied can prohibit detecting DE genes
  • 49. Long story to say... We need to estimate the model behind the count. Never work without biological replicates. Never work with 2 biological replicates. Try avoiding working with 3 biological replicates. Go for at least 4 biological replicates.
  • 51. Overview Sample 1 RNA-seq Condition X GeneA GeneB GeneC GeneA GeneB GeneC GeneA GeneB GeneC GeneA GeneB GeneC GeneA GeneB GeneC GeneA GeneB GeneC Sample 2 RNA-seq Sample 3 RNA-seq Sample 4 RNA-seq Condition Y Sample 5 RNA-seq Sample 6 RNA-seq
  • 52. Summary Obviously, the number of reads is dependent on: 1. chance → Define the count model (NB) from replicates 2. the expression level of the gene → Compare the ratios with a test 2. the total number of reads generated 3. the length of the transcript
  • 53. The total number of reads generated sample RNA-seq GeneA GeneB GeneC sample More RNA-seq GeneA GeneB GeneC The number of reads is dependent on the total number of reads generated. If one library is sequenced to 20M reads, and another one to 40M, most genes will ~double their counts.
  • 54. Normalization for library size Naive approach: divide by total library size. Is not applied anymore! Why not? Composition matters! 2 things to remember: - zero sum system (or “we cannot count what we can't sequence”) - 5 orders of magnitude
  • 55. Normalization for library size 2 things to remember: - zero sum system - 5 orders of magnitude In every sample, a lot of reads are spend on few extremely highly expressed genes. Which genes? That differ between libraries, but affects negatively the naïve size normalization if we include those genes.
  • 56. Normalization for library size Schematically: when normalized on library size (square represent number of reads). Few genes with enormous counts: there is NO SATURATION of these counts Rest of the genes All counts for library A Rest of the genes All counts for library B
  • 57. Normalization for library size Better normalization would be as shown below. DESeq2 and EdgeR apply such an approach (see 100% later). 100% Rest of the genes Rest of the genes
  • 58. Gene length influence the count “Longer transcripts generate more reads” True! But the transcript length does not differ between samples. Since we are concerned with relative differences between samples, this needs no normalization (this story changes in case of absolute quantification). Sample A Sample B Gene A Gene A Gene B Gene B
  • 59. Between sample variation Properties of libraries/samples can effect the counts, and lead to variation. This is called between-lane variation. Obvious ones: library size (how many reads are sampled), library composition. Different libraries/samples can exhibit increased variation by differing in how gene properties relate to gene counts. This is called within-lane variation.
  • 60. GC-content of genes can influence counts GC-content differs between genes. But it does not change between samples, so there should be no problem for relative expression comparison. We can visualize the relationship between counts and GC very easily (see right). There is some trend, and it is equal for all samples. EDAseq (R)
  • 61. GC-content of genes can influence counts Sometimes, samples show different relationships between GC-content of the genes and the counts. This within-lane variation (or intra-sample) variation needs to be corrected for, so that in one sample not all differentially expressed genes are also the GC-riched ones. Length can have also this effect.
  • 62. What we need to know for our set-up We want to detect differentially expressed genes between 2 or more conditions. For this, we need to apply the conditions in a controlled environment (randomisation,...). For good testing, we need to have some biological replicates per condition. For cost effectiveness, we determine how deep we will sequence from each sample. We analyse the reads, get raw counts and do the test!
  • 63. Library preparation and lane loading HiSeq2000: 24 single-index barcodes available. 1 lane gives 150-180 M reads. One lane of 50 bp SE approx €1.500.
  • 64. Bioinformatics analysis will take most of your time Biological insight DE test Quality control (QC) of raw reads QC of the count table Count table extraction Preprocessing: filtering of reads and read parts, to help our goal of differential detection. QC of preprocessing QC of the mapping Mapping to a reference genome (alternative: to a transcriptome)
  • 65. Bioinformatics analysis will take most of your time Biological insight DE test Quality control (QC) of raw reads QC of the count table Count table extraction Preprocessing: filtering of reads and read parts, to help our goal of differential detection. QC of preprocessing QC of the mapping Mapping to a reference genome (alternative: to a transcriptome)
  • 66. Bioinformatics analysis will take most of your time Biological insight 6 1 DE test Quality control (QC) of raw reads 5 QC of the count table 4 Count table extraction Preprocessing: filtering of reads 2 and read parts, to help our goal of differential detection. QC of the mapping 3 QC of preprocessing Mapping to a reference genome (alternative: to a transcriptome)
  • 68. The numbers get reduced with every step 25M 20M 15M
  • 69. Deeper, or more replicates? Variance will be lower with more reads: but sequencing another biological replicate is preferred over sequencing deeper, or technical reps. Doi: 10.1093/bioinformatics/btt015
  • 70. There is tool to help you set up
  • 71. Scotty – power analysis Power: the probability to reject the null hypothesis if the alternative is true. 'How many samples and how deep in order to minimize false negatives'. (a null hypothesis is always a scenario in which there is no difference, hence no differential expression). Alternative tools: http://wiki.bits.vib.be/index.php/RNAseq_toolbox
  • 73. How many samples to sequence? → Scotty exercise
  • 74. Keywords A read count of a gene is dependent on: 1. chance 2. expression level 3. transcript length 4. depth of sequencing 5. GC-content Poisson distribution Negative binomial distribution Condition Sample Normalization Write in your own words what the terms mean
  • 75. Reads All my references available at: https://www.zotero.org/groups/dernaseq/items