SlideShare a Scribd company logo
1 of 53
Cancer Systems Biology:
RNA-Seq
August 16, 2012
Anne Deslattes Mays
Wellstein/Riegel Laboratory
Mentor: Anton Wellstein, MD, PhD
9/13/2013 Wellstein/Riegel Laboratory 1
Talk Outline
• What is Systems Biology?
• What is RNA-Seq?
• RNA-Seq Differential Expression Analysis
9/13/2013 Wellstein/Riegel Laboratory 2
Systems Biology is a systems approach to building
testable models of biology using observation and
measurement
9/13/2013 Wellstein/Riegel Laboratory 3
Systems Biology brings together interdisciplinary
fields, tools, analysis and platforms
• Genomics
• Epigenomics/epgenetics
• Transcriptomics
• Proteomics
• Metabolomics
• Glycomics
• Lipidomics
• Interactomics
• NeuroElectroDynamics
• Fluxomics
• Biomics
9/13/2013 Wellstein/Riegel Laboratory 4
What is the discipline of Systems Biology?
A Reverse Engineering Discipline
9/13/2013 Wellstein/Riegel Laboratory 5
Input
Process
Output
Perhaps more Equivalent to a Decipher Project:
Alan Turing and the group of codebreakers during world war two
deciphered the codes created by the Enigma.
A Biological System is communicating we are trying to crack the code.
9/13/2013 Wellstein/Riegel Laboratory 6
Genome
Transcriptome
Proteome
Metabolome
What is Systems Biology?
Systems Biology is a discipline using a
multitude of measurement technologies to
capture the entirety of a biological systems
parts and then attempts to reverse engineer
that biological system’s ability to dynamically
remodel in its response to stimuli
9/13/2013 Wellstein/Riegel Laboratory 7
Sequencing
technologies
Mass Spec
technologies
What is Systems Biology?
Systems Biology is a discipline using a
multitude of measurement technologies to
capture the entirety of a biological systems
parts and then attempts to reverse engineer
that biological system’s ability to dynamically
remodel in its response to stimuliGenome
Transcriptome
Proteome
Metabolome
9/13/2013 Wellstein/Riegel Laboratory 8
What is Systems Biology?
Technology Advances
Spurs
Research Advances
Systems Biology is a discipline using a
multitude of measurement technologies to
capture the entirety of a biological systems
parts and then attempts to reverse engineer
that biological system’s ability to dynamically
remodel in its response to stimuli
Sequencing
technologies
Mass Spec
technologies
Genome
Transcriptome
Proteome
Metabolome
9/13/2013 Wellstein/Riegel Laboratory 9
RNA-seq
Here is an example RNA-Seq Workflow
9/13/2013 Wellstein/Riegel Laboratory 10
Experimental
Design
Sample
Collection
Quality Control
Read Trimming
Differential
Analysis
Transcript
Identification
Pathway
Analysis
Marker
Discovery
Sequencing
9/13/2013 Wellstein/Riegel Laboratory 11
Three steps to get to a fresh sequence with the Illumina
Genome Sequence Analyzer
• Library generation
• Cluster generation
• Sequencing
9/13/2013 Wellstein/Riegel Laboratory 12
Before Library Construction
1. Poly-A Selection (Total RNA ->
mRNA)
2. mRNA fragmentation
3. First strand synthesis (here we stop
if we want to maintain strand
specificity
4. Second strand synthesis
Other techniques
1. Ribozero
2. Ribominus
Library Construction: Messenger RNA are Poly-A selected
from Total RNA, fragmented and cDNA synthesized
9/13/2013 Wellstein/Riegel Laboratory 13
cDNA (single or double stranded)
1. cDNA is blunt end-repaired and
phosphorylated (B.)
2. A-base added to prepare for
indexed adapter ligation (C.)
Library Construction: End repair and adenylation results in
adapter ligation ready constructs
9/13/2013 Wellstein/Riegel Laboratory 14
Index adapter ligation and product
ready for amplification on cBot or
the cluster station
1. Strand specific tags are added to
the A base – ligate index adapter
(D)
2. Denature and amplify for final
product (E)
Library Construction: Adapter ligation results in cluster-
generation-ready constructs
9/13/2013 Wellstein/Riegel Laboratory 15
Single DNA molecules hybridize to
the lawn of oligos grafted to the
surface of the flow cell
1. Oligo lawn
2. Oligos hybridize to the adapters
that had been ligated to the
library fragments which flow
through the cell
Cluster Generation: In the illumina Cbot system, single molecules are
isothermally amplified in a flow cell to prepare them for sequencing
9/13/2013 Wellstein/Riegel Laboratory 16
Bridge amplifications resulting in
100s of millions of unique clusters
1. Each fragment is clonally
amplified through a series of
extensions and isothermal bridge
amplifications
2. Reverse strands cleaved and
washed away
3. Ends are blocked
4. Sequencing primer hybridized to
the DNA template
5. Libraries are ready for
sequencing
Cluster generation: Bound fragments are extended to make
copies and reverse strands cleaved and washed away
9/13/2013 Wellstein/Riegel Laboratory 17
4 fluorescently labeled reversibly
terminated nucleotides
1. Each base competes for addition
2. Natural competition ensures
highest accuracy
3. After each round of
synthesis, clusters are excited by
a laser emitting a color that
identifies the newly added base
4. Fluorescent label and blocking
group are removed allowing for
addition of next nucleotide
5. Proprietary (Illumina) chemistry
reads a base in each cycle
6. Allows for accurate sequencing
through difficult regions such as
homopolymers and repetitive
sequence
Sequencing: 100s of millions of clusters sequenced
simultaneously
What was good for DNA is now good for RNA
• Technology advances => higher throughput sequencing at
lower costs
• Whole Genome Sequencing has enabled
• Whole Transcriptome Sequencing
• Workflow for DNA sequencing and RNA sequencing is similar
9/13/2013 Wellstein/Riegel Laboratory 18
There are other ways to Inquire about the
Transcriptome
• Array Based Technologies
– Affymetrix
– Agilent
– Known genes and hybridization protocols
• Microarray
– 20,000+ array experiments on a single platform
– Edge effects
– False positives / false negatives
• Bead-based arrays
• Tiling arrays
• SAGE
9/13/2013 Wellstein/Riegel Laboratory 19
What is unique about RNA-Seq?
• Allows you to discover and profile the entire transcriptome of
any organism
• No probes or primers to design
• Novel transcripts
• Novel isoforms
• Alternative splice sites
• Rare transcripts
• cSNPS – all of this in one experiment
9/13/2013 Wellstein/Riegel Laboratory 20
9/13/2013 Wellstein/Riegel Laboratory 21
After sequencing…
1. Quality control – trim your reads
2. Count Reads
• Align to genome
• Align to transcriptome
3. Interpret Data
• Statistical tests (differential
expression analysis)
• Visualization (mapped
reads)
• Pathway analysis
Not so simple – big data, big
compute requirements
After sequencing, we must then perform
RNA-Seq Data Analysis
9/13/2013 Wellstein/Riegel Laboratory 22
How much RNA-sequencing data?
1. 20 million paired end reads ~ 2 GB of data
2. 100 million paired end reads ~ 10 GB of data
How much computation power?
1. More memory, more processors, less time it takes to compute
2. Outsource the analysis, still will need to store the results somewhere
Amazon web services
S3 storage
EC elastic cloud on demand computational facility
Georgetown University High Performance Computer Core
matrix.georgetown.edu
UPENN Galaxy services
How much RNA-sequencing data, how much computation
power and where do you go to compute?
9/13/2013 Wellstein/Riegel Laboratory 23
A growing number of tools enable RNA-Seq analysis
These RNA-Seq tools are used for mapping reads, aligning
reads and providing input for differential expression analysis
• Tuxedo suite
– Bowtie, Tophat, Cufflinks
• Trinity Suite
– Inchworm, chrysallis, butte
rfly
• RUM
– RNA Unified Mapper
9/13/2013 Wellstein/Riegel Laboratory 24
9/13/2013 Wellstein/Riegel Laboratory 25
What percentage of reads are covered? What
percentage of reads are mapped?
3’ Bias on transcript reads
1. 60-80% of reads are mapped
2. Highest percentage or 3’ end of
reads are mapped
3. Reads need to be quality trimmed
Mapping tools bias exons to known
genes
9/13/2013 Wellstein/Riegel Laboratory 26
Galaxy is a web based tool committed to enable a
researcher (more than just for RNA-Seq)
9/13/2013 Wellstein/Riegel Laboratory 27
How to visualize mapped results?
• UCSC Genome Browser (Gbrowse)
• Integrated Genome Browser (IGB)
• Integrated Genome Viewer (IGV)
Many shared formats, reading many of the outputs generated by
the programs, ability to generate ones own tracks
9/13/2013 Wellstein/Riegel Laboratory 28
9/13/2013 Wellstein/Riegel Laboratory 29
9/13/2013 Wellstein/Riegel Laboratory 30
What do RNA-Seq reads look like for GAPDH?
Repeat masked allowing 1/2 mismatched bases blat’d reads
viewed in IGB 6.7.2
9/13/2013 Wellstein/Riegel Laboratory 32
RNA-Seq Differential
Expression analysis
What does GAPDH look like in terms of quantitation?
TOTAL BM HPP
RPKM 3SEQ Counts BLAT Reads RPKM 3SEQ Counts BLAT Reads
CD34 0.7 340 230 8 8 14
BST1 19.7 5374 31 31
CD133 0.2 173 176 16 16 33
THY1 0 7 4 4
A12 1 0
A5 0 0
ALK 0 9 24 0 0 3
B9 0 0
C1 0 0
C2 0 0
C7 0 0
E7 0 0
E9 2 0
F6 0 0
G12 0 0
GAPDH 3013.2 727831 356289 120.8 5559 2670
H3 0 0
Blat read raw counts ratio == 3Seq counts ratio ~= 130 to 1
RPKM ratio ~= 24.3
9/13/2013 Wellstein/Riegel Laboratory 34
RNA-Seq Quantification Challenge: A problem that
exists with RNA-Seq data that doesn’t exist with array
data: Longer transcripts produce more reads than
shorter transcripts
One solution to account for this is RPKM (FPKM used by Cufflinks)
RPKM = 10^9 x C / NL, which is really just simply C/N
C(gene)= the number of mappable reads that fall onto a gene's exons
N= total number of mappable reads in the experiment
L(gene)= the sum of the exons in base pairs.
Wold (2008)
9/13/2013 Wellstein/Riegel Laboratory 35
Cufflinks: Transcript assembly, differential expression, and
differential regulation for RNA-seq
9/13/2013 Wellstein/Riegel Laboratory 36
Cuffdiff produces many output files:
1. Transcript FPKM expression tracking.
2. Gene FPKM expression tracking; tracks the summed FPKM of transcripts sharing each gene_id
3. Primary transcript FPKM tracking; tracks the summed FPKM of transcripts sharing each tss_id
4. Coding sequence FPKM tracking; tracks the summed FPKM of transcripts sharing each p_id, independent
of tss_id
5. Transcript differential FPKM.
6. Gene differential FPKM. Tests differences in the summed FPKM of transcripts sharing each gene_id
7. Primary transcript differential FPKM. Tests difference sin the summed FPKM of transcripts sharing each
tss_id
8. Coding sequence differential FPKM. Tests difference sin the summed FPKM of transcripts sharing each p_id
independent of tss_id
9. Differential splicing tests: this tab delimited file lists, for each primary transcript, the amount of
overloading detected among its isoforms, i.e. how much differential splicing exists between isoforms
processed from a single primary transcript. Only primary transcripts from which two or more isoforms are
spliced are listed in this file.
10. Differential promoter tests: this tab delimited file lists, for each gene, the amount of overloading detected
among its primary transcripts, i.e. how much differential promoter use exists between samples. Only
genes producing two or more distinct primary transcripts (i.e. multi-promoter genes) are listed here.
11. Differential CDS tests: this tab delimited file lists, for each gene, the amount of overloading detected
among its coding sequences, i.e. how much differential CDS output exists between samples. Only genes
producing two or more distinct CDS (i.e. multi-protein genes) are listed here.
9/13/2013 Wellstein/Riegel Laboratory 37
RNA-Seq Quantification Challenge: DESeq Method uses
the geometric mean of counts in all samples
DESeq Method:
Construct a "reference sample" by taking, for each gene, the geometric mean
of the counts in all samples.
To get the sequencing depth of a sample relative to the reference, calculate
for each gene the quotient of the counts in your sample divided by the counts
of the reference sample.
Now you have, for each gene, an estimate of the depth ratio.
Simply take the median of all the quotients to get the relative depth of the
library.
'estimateSizeFactors' function of DESeq package does this calculation.
DESeq: an R package that works with Raw Counts to
determine genes differentially expressed across samples
• Simon Anders
9/13/2013 Wellstein/Riegel Laboratory 38
9/13/2013 Wellstein/Riegel Laboratory 39
9/13/2013 Wellstein/Riegel Laboratory 40
9/13/2013 Wellstein/Riegel Laboratory 41
What is Systems Biology?
Technology Advances
Spurs
Research Advances
Systems Biology is a discipline using a
multitude of measurement technologies to
capture the entirety of a biological systems
parts and then attempts to reverse engineer
that biological system’s ability to dynamically
remodel in its response to stimuli
Sequencing
technologies
Mass Spec
technologies
Genome
Transcriptome
Proteome
Metabolome
Resources
• http://dx.doi.org/10.1038/npre.2010.4282.1 (DESeq)
• http://galaxy.psu.edu/
• http://seqanswers.com/
• http://www.broadinstitute.org/igv/
• http://bioviz.org/igb/index.html
• http://www.illumina.com
• http://www.otogenetics.com
• http://www.dnanexus.com
• http://cufflinks.cbcb.umd.edu/
• http://brb.nci.nih.gov/BRB-ArrayTools.html
9/13/2013 Wellstein/Riegel Laboratory 42
9/13/2013 Wellstein/Riegel Laboratory 43
Acknowledgements
Dr. Anton Wellstein
Dr. Anna Riegel
Dr. Marcel Schmidt
Jean-Baptiste Masarati
Dr. Elena Tassi
The entire lab: Tibari, Ghada, Ivana, Eveline, the entire Wellstein/Riegel laboratory
My Committee
Dr. Yuri Gusev
Dr. Anatoly Dritschilo
Dr. Michael Johnson
Dr. Christopher Loffredo
Dr. Habtom Ressom
Dr. Terry Ryan (external committee member)
High Performance Core Group, Steve Moore, especially Woonki Chung
Amazon Cloud Services
Dr. Ann Loraine, UNC, IGB Developer
Brian Haas, Author Trinity Suite
Given a list of differentially expressed Genes now
enrichment analysis should be performed
• Enrichment analysis allows the researcher to leverage
documented experiments which provide evidence for genes
roles in pathways and functions that enable the researcher to
determine the results and significance of their experiments
• DAVID
– Gene ontology
– Functional ontology
• Revigo
– Output of David may be placed in REVIGO for further
interpretation and statistical exploration of significance of
discovered sets of genes
9/13/2013 Wellstein/Riegel Laboratory 44
Using differentially expressed genes, biological
pathways should be explored
• Differentially expressed genes are put into programs such as
pathway studio or ingenuity
• Shortest path programs and
• Canonical pathway analysis
• Enables a researcher to reverse engineer the pathways
expressed in the course of a healthy response to a diseased
response
• Ideally a pathway reveals the observed phenotype –
connecting the expressed gene expression program with the
phenotype – genotype – gene expression program to
phenotype
9/13/2013 Wellstein/Riegel Laboratory 45
9/13/2013 Wellstein/Riegel Laboratory 46
FGFBP1 pathways control after induction of a conditional transgene in a mouse model:
Information derived from mRNA expression pattern analysis
Anne Deslattes Mays, Elena Tassi, Anton Wellstein
Department of Oncology and Medicine, Lombardi Cancer Center, Washington DC 20057
Abstract
Fibroblast Growth Factors (FGFs) play a significant role in embryonic development,
maintenance of tissue homeostasis in the adult as well as in different diseases. FGF-binding
proteins (FGF-BP) are secreted proteins that chaperone FGFs stored in the extracellular matrix
to their cognate receptor, and can thus modulate FGF signaling. FGF-BP1 (BP1 a.k.a. HBp17)
expression is required for embryonic survival, can modulate FGF-dependent vascular
permeability in embryos and is an angiogenic switch in human cancers. To determine the
function of BP1 in vivo, we generated tetracycline-regulated conditional BP1 transgenic mice.
BP1 expressing mice are viable, fertile and phenotypically indistinguishable from their
littermates. Five cDNA Affymetrix arrays were run on the kidneys of the FGF-BP1 transgenic
mice. Two arrays were run for the animals under doxycyclin diet with the transgene switched
off, one array was run with induction of the FGF-BP1 transgene for 24 hours, one array was run
with induction of the FGF-BP1 transgene for 336 hours representing a chronic induction of the
transgene. The results indicate that when properly normalized, time series analysis of a large
array can reveal the signal transduction pathways. Pattern analysis allows for a systems
biology review of the data and allows for the exploration and generation of testable hypotheses.
Figure 3 – Heatmap scaled by probe - After RMA normalization, selection of significant over and
under expressors relative to the average of the FGFBP1 transgene being off, analysis of the heatmap
reveals mutually exclusive clusters. These clusters indicate genes that are off from one state until the
other. Cluster A represents those genes that are off with the FGFBP1 transgene being off and switched
on when the FGFBP1 transgene is activated for 24 hours. Cluster B contains those genes that are off
at 24 hours but activated when the FGFBP1 transgene is on for 48 hours. Cluster C contains those
genes that are off at 48 hours but on when the FGFBP1 transgene is on for 336 hours – or chronically.
Studying these genes in this order, and with this pattern, allows the exploration of the signal transduction
and activation pathway in response to the activation of FGFBP1 transgene.
A
B
C
A
B
C
Figure 5– Gene Details – The detail for the genes found in the clusters of Figure 1 are described above
in tables A, B and C. The genes responding after activation of the FGFBP1 transgene for 24 hours
includes immunoglobulin kappa chain variable 21, 3-phosphoglycerate dehydrogenazes, a zinc finger
protein, neuroantin, and homeobox B8. The genes found in table B, represent those genes activated
after 48 hours of the FGFBP1 transgene being on. Included in this set is the hemopexin and major
urinary protein 3. Finally after 336 hours – truly representing chronic activation of the FGFBP1
transgene, we have one gene, Reg3b, associated with inflammatory response (according to GO
ontology).
Figure 2 – Distinct Expression Patterns When
Filtering by Thresholds at Timepoints. By
creating a filter to capture the distinctive patterns
that are expressing themselves at each of the
separate timepoints, One can understand the
major message being communicated at each
timep oint. The patterns of expression are
distinctive. Panel A are the expression patterns for
those genes above a threshold at 24 hours. Panel
B are the expression patterns for those genes
above a threshold at 48 hours and Panel C are the
expression patterns for those genes above a
threshold at 336 hours – or at a chronic transgene
Expression level.
Figure 4- FGFBP1 pathways – Using Pathway Studio, the shortest path through the set of genes that
were selected from filtering by a band pass filter at each of the time points, 24 hours, 48 hours and 336
hours was constructed. The resulting selection of diseases, cell processes, and functional classes
were the result of Pathway Studio constructing the shortest path to connect those genes in the set.
Conclusions
A systems biology approach to analyzing large data sets, such as this study which involved five full mouse
cDNA arrays allows the researcher to capture a snapshot of the unfolding remodeling events of an
organisms response to change, stress or disease. Analyzing data in this form involves filtering the
biological signal from the noise. Sorting the noise in appropriate manners is essential to be able to
complete the biological story. Building on existing knowledge base, we can complete the picture as long as
the proper context of the collection, normalization and analysis is maintained. High throughput technologies
such as microarrays and RNA sequencing as enabled by next generation sequencing presents the
researcher with the challenge of extracting meaningful information from the measurements. Software tools
and analysis techniques are not a substitute to understanding the biological context from which the data
are collected. Engineering and digital signal processing has allowed us to derive the understanding of how
to reconstruct a signal from the presence of a continual stream of noisy analog data. Sampling frequency
and proper filtering are a must to be able to sort out a meaningful signal from the noise. These same
principles apply not only to communication theory but also when studying large data such as those that
may be collected from high throughput systems such as a Affymetrix mouse cDNA array.
A
B
C
0 A
B
C
Figure 1 Panels. 0, A, B, and
C, illustrate ordering based
upon the expression values of
the control (FGFBP OFF), 24
hour expression (FGFBP1 On
24 hours), 48 hour expression
(FGFBP1 On 48 hours), and
336 hour expression (FGFBP
On 336 hours). The insight
gained from this inspection
includes the ability to see the
relative changes of
expression at each of these
time points.
Figure 6 – Graphical Gaussian
Model. Using the expression profiles,
a quassi-Bayesian analysis is
performed constructing the partial
correlation network among the top
expressing genes. Note that C9
(complement component 9) was not
able to be placed in context of the data
in the Pathway Studio diagram,
however using the partial correlations,
we are able to place it as strongly
positively correlated to Serpina3k,
Cyp3all, MUG1, Tdo2, Mup3, Hpx,
weakly positively correlated to Hamp,
and strongly negatively correlated to
Tex10. Together indicating the
placement of C9 in the Endothelial
response.
Scientific knowledge is limited (and advanced) by the
limits (and advancements) of measurement
9/13/2013 Wellstein/Riegel Laboratory 47
• Ilya Shmulevich Genomic Signal Processing “Validity of the
model involves observation and measurement, scientific
knowledge is limited by the limits of measurement”
• Erwin Shrödinger Science Theory and Man: “It really is the
ultimate purpose of all schemes and models to serve as
scaffolding for any observations that are at all means
observable”
9/13/2013 Wellstein/Riegel Laboratory 48
Before Library Construction
1. Most vendors and cores will assess
the quality of the RNA before
sequencing
2. Important to determine before
sequencing begins
Garbage – in == Garbage out
Before library construction, RNA quality must be assessed
9/13/2013 Wellstein/Riegel Laboratory 49
Cluster Generation
• Cbot cluster system single molecules are isothermally amplified in
a flow cell to prepare them for high-throughput sequencing
• 8 channel genome analyzer has a dense lawn of oligos
• Single DNA molecules hybridize to the lawn of oligos
• Bound fragments are extended to make copies
• Copies covalently bound to the flowcells surface
• Each fragment is clonally amplified through a series of extensions
and isothermal bridge amplifications resulting in 100s millions of
unique clusters
• Reverse strands cleaved and washed away
• Ends are blocked
• Sequencing primer hybridized to the DNA template
• After cluster generation, libraries are ready for sequencing
9/13/2013 Wellstein/Riegel Laboratory 50
Sequencing
• 100s of millions of clusters sequenced simultaneously
• Using 4 fluorescently labeled reversibly terminated
nucleotides
• Natural competition ensures highest accuracy
• After each round of synthesis, clusters are excited by a laser
emitting a color that identifies the newly added base
• Fluorescent label and blocking group are then removed
allowing for the addition of the next nucleotide
• Proprietary chemistry (Illumina) reads a base in each cycle
• Allows for accurate sequencing through difficult regions such
as homopolymers and repetitive sequence
9/13/2013 Wellstein/Riegel Laboratory 51
Systems Biology History (wikipedia)
• Systems biology roots found in
– Quantitative modeling of enzyme kinetics
– Mathematical modeling of population growth
– Simulations to study neurophysiology
– Control theory and cybernetics
• Theorists
– Ludwig von Bertalanffy – General Systems Theory
– Alan Lloyd Hodgkin and Andrew Fielding Huxley – constructed a
mathematical model that explained potential propagating along the
axon of a neuron cell
– Denis Nobel – first computer model of the heart Pacemaker
9/13/2013 Wellstein/Riegel Laboratory 52
Institutes of Systems Biology
• 2000 – Institutes of Systems Biology established in Seattle and
Tokyo
• After completion of Human Genome projects
• NSF grand challenge for systems biology – build a
mathematical model of the whole cell
9/13/2013 Wellstein/Riegel Laboratory 53

More Related Content

What's hot

Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationIllumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationZachary Smith
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ... Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...Fabio Caligaris
 
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated Workstation
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated WorkstationIllumina TruSight HLA Sequencing Panel_Biomek FXP Automated Workstation
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated WorkstationZachary Smith
 
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...Baptiste Mayjonade
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne Vaittinen
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...QIAGEN
 
Genapsys DNA sequencing
Genapsys DNA sequencingGenapsys DNA sequencing
Genapsys DNA sequencingMelvin Alex
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Torsten Seemann
 
Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.
Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.
Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.Acmas Technologies Pvt. Ltd.
 
NEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated Workstation
NEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated WorkstationNEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated Workstation
NEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated WorkstationZachary Smith
 
Back to basics: Fundamental Concepts and Special Considerations in RNA Isolation
Back to basics: Fundamental Concepts and Special Considerations in RNA IsolationBack to basics: Fundamental Concepts and Special Considerations in RNA Isolation
Back to basics: Fundamental Concepts and Special Considerations in RNA IsolationQIAGEN
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionMinesh A. Jethva
 
Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...
Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...
Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...Zachary Smith
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGScursoNGS
 

What's hot (20)

Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated WorkstationIllumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
Illumina TruSeq Stranded mRNA_Biomek FXP Automated Workstation
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ... Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated Workstation
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated WorkstationIllumina TruSight HLA Sequencing Panel_Biomek FXP Automated Workstation
Illumina TruSight HLA Sequencing Panel_Biomek FXP Automated Workstation
 
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
BEST PRACTICE TO MAXIMIZE THROUGHPUT WITH NANOPORE TECHNOLOGY & DE NOVO SEQUE...
 
20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop20140711 3 t_clark_ercc2.0_workshop
20140711 3 t_clark_ercc2.0_workshop
 
DNA_Services
DNA_ServicesDNA_Services
DNA_Services
 
Anne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentationAnne_Vaittinen_advanced_seminar_presentation
Anne_Vaittinen_advanced_seminar_presentation
 
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
Targeted RNAseq for Gene Expression Using Unique Molecular Indexes (UMIs): In...
 
Genapsys DNA sequencing
Genapsys DNA sequencingGenapsys DNA sequencing
Genapsys DNA sequencing
 
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
Snippy - Rapid bacterial variant calling - UK - tue 5 may 2015
 
Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.
Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.
Moisture Analyzer Petrochemical by ACMAS Technologies Pvt Ltd.
 
NEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated Workstation
NEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated WorkstationNEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated Workstation
NEBNext Small RNA Kit for Illumina NGS_Biomek 4000 Automated Workstation
 
Back to basics: Fundamental Concepts and Special Considerations in RNA Isolation
Back to basics: Fundamental Concepts and Special Considerations in RNA IsolationBack to basics: Fundamental Concepts and Special Considerations in RNA Isolation
Back to basics: Fundamental Concepts and Special Considerations in RNA Isolation
 
NGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools SelectionNGS Pipeline Preparation - Tools Selection
NGS Pipeline Preparation - Tools Selection
 
ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...
Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...
Illumina Nextera XT_Biomek FXP Dual-Arm Multi-96 and Span-8 Automated Worksta...
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
covering letter (2)
covering letter (2)covering letter (2)
covering letter (2)
 
20140710 4 a_bergstrom_lucas_ercc2.0_workshop
20140710 4 a_bergstrom_lucas_ercc2.0_workshop20140710 4 a_bergstrom_lucas_ercc2.0_workshop
20140710 4 a_bergstrom_lucas_ercc2.0_workshop
 

Viewers also liked

Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...Ann Loraine
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqTimothy Tickle
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seqb0rAAs
 
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)Masahito Ohue
 
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...Bioo Scientific
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014LutzFr
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolHong ChangBum
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Christos Argyropoulos
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 

Viewers also liked (15)

Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seq
 
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
計算で明らかにするタンパク質の出会いとネットワーク(FIT2016 助教が吼えるセッション)
 
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
Bioo Scientific - Reduced Bias Small RNA Library Prep with Gel-Free or Low-In...
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
Correcting bias and variation in small RNA sequencing for optimal (microRNA) ...
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 

Similar to 2012 august 16 systems biology rna seq v2

New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...Andor Kiss
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizeAnn Loraine
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slidesharehansjansen9999
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
Bioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanBioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanSardar Arifuzzaman
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencingshinycthomas
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012Dan Gaston
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1QIAGEN
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
De novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis meloDe novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis melobioejjournal
 
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELODE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELObioejjournal
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptRuthMWinnie
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptEdizonJambormias2
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markerssukruthaa
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 

Similar to 2012 august 16 systems biology rna seq v2 (20)

2013 oct 2 rna sequencing
2013 oct 2 rna sequencing2013 oct 2 rna sequencing
2013 oct 2 rna sequencing
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare20160308 dtl ngs_focus_group_meeting_slideshare
20160308 dtl ngs_focus_group_meeting_slideshare
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
Bioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzamanBioinformatics class ppt arifuzzaman
Bioinformatics class ppt arifuzzaman
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
Digital RNAseq Technology Introduction: Digital RNAseq Webinar Part 1
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
De novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis meloDe novo transcriptome assembly of solid sequencing data in cucumis melo
De novo transcriptome assembly of solid sequencing data in cucumis melo
 
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELODE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
DE NOVO TRANSCRIPTOME ASSEMBLY OF SOLID SEQUENCING DATA IN CUCUMIS MELO
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.pptAdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
AdamAmeur_SciLife_Bioinfo_course_Nov2015.ppt
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
Sequence based Markers
Sequence based MarkersSequence based Markers
Sequence based Markers
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 

More from Anne Deslattes Mays

Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuAnne Deslattes Mays
 
Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Anne Deslattes Mays
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...Anne Deslattes Mays
 
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...Anne Deslattes Mays
 
RNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript DiscoveryRNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript DiscoveryAnne Deslattes Mays
 

More from Anne Deslattes Mays (6)

Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
 
RNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript DiscoveryRNA Sequencing for Full Length Transcript Discovery
RNA Sequencing for Full Length Transcript Discovery
 

Recently uploaded

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Recently uploaded (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

2012 august 16 systems biology rna seq v2

  • 1. Cancer Systems Biology: RNA-Seq August 16, 2012 Anne Deslattes Mays Wellstein/Riegel Laboratory Mentor: Anton Wellstein, MD, PhD 9/13/2013 Wellstein/Riegel Laboratory 1
  • 2. Talk Outline • What is Systems Biology? • What is RNA-Seq? • RNA-Seq Differential Expression Analysis 9/13/2013 Wellstein/Riegel Laboratory 2
  • 3. Systems Biology is a systems approach to building testable models of biology using observation and measurement 9/13/2013 Wellstein/Riegel Laboratory 3
  • 4. Systems Biology brings together interdisciplinary fields, tools, analysis and platforms • Genomics • Epigenomics/epgenetics • Transcriptomics • Proteomics • Metabolomics • Glycomics • Lipidomics • Interactomics • NeuroElectroDynamics • Fluxomics • Biomics 9/13/2013 Wellstein/Riegel Laboratory 4
  • 5. What is the discipline of Systems Biology? A Reverse Engineering Discipline 9/13/2013 Wellstein/Riegel Laboratory 5 Input Process Output Perhaps more Equivalent to a Decipher Project: Alan Turing and the group of codebreakers during world war two deciphered the codes created by the Enigma. A Biological System is communicating we are trying to crack the code.
  • 6. 9/13/2013 Wellstein/Riegel Laboratory 6 Genome Transcriptome Proteome Metabolome What is Systems Biology? Systems Biology is a discipline using a multitude of measurement technologies to capture the entirety of a biological systems parts and then attempts to reverse engineer that biological system’s ability to dynamically remodel in its response to stimuli
  • 7. 9/13/2013 Wellstein/Riegel Laboratory 7 Sequencing technologies Mass Spec technologies What is Systems Biology? Systems Biology is a discipline using a multitude of measurement technologies to capture the entirety of a biological systems parts and then attempts to reverse engineer that biological system’s ability to dynamically remodel in its response to stimuliGenome Transcriptome Proteome Metabolome
  • 8. 9/13/2013 Wellstein/Riegel Laboratory 8 What is Systems Biology? Technology Advances Spurs Research Advances Systems Biology is a discipline using a multitude of measurement technologies to capture the entirety of a biological systems parts and then attempts to reverse engineer that biological system’s ability to dynamically remodel in its response to stimuli Sequencing technologies Mass Spec technologies Genome Transcriptome Proteome Metabolome
  • 10. Here is an example RNA-Seq Workflow 9/13/2013 Wellstein/Riegel Laboratory 10 Experimental Design Sample Collection Quality Control Read Trimming Differential Analysis Transcript Identification Pathway Analysis Marker Discovery Sequencing
  • 11. 9/13/2013 Wellstein/Riegel Laboratory 11 Three steps to get to a fresh sequence with the Illumina Genome Sequence Analyzer • Library generation • Cluster generation • Sequencing
  • 12. 9/13/2013 Wellstein/Riegel Laboratory 12 Before Library Construction 1. Poly-A Selection (Total RNA -> mRNA) 2. mRNA fragmentation 3. First strand synthesis (here we stop if we want to maintain strand specificity 4. Second strand synthesis Other techniques 1. Ribozero 2. Ribominus Library Construction: Messenger RNA are Poly-A selected from Total RNA, fragmented and cDNA synthesized
  • 13. 9/13/2013 Wellstein/Riegel Laboratory 13 cDNA (single or double stranded) 1. cDNA is blunt end-repaired and phosphorylated (B.) 2. A-base added to prepare for indexed adapter ligation (C.) Library Construction: End repair and adenylation results in adapter ligation ready constructs
  • 14. 9/13/2013 Wellstein/Riegel Laboratory 14 Index adapter ligation and product ready for amplification on cBot or the cluster station 1. Strand specific tags are added to the A base – ligate index adapter (D) 2. Denature and amplify for final product (E) Library Construction: Adapter ligation results in cluster- generation-ready constructs
  • 15. 9/13/2013 Wellstein/Riegel Laboratory 15 Single DNA molecules hybridize to the lawn of oligos grafted to the surface of the flow cell 1. Oligo lawn 2. Oligos hybridize to the adapters that had been ligated to the library fragments which flow through the cell Cluster Generation: In the illumina Cbot system, single molecules are isothermally amplified in a flow cell to prepare them for sequencing
  • 16. 9/13/2013 Wellstein/Riegel Laboratory 16 Bridge amplifications resulting in 100s of millions of unique clusters 1. Each fragment is clonally amplified through a series of extensions and isothermal bridge amplifications 2. Reverse strands cleaved and washed away 3. Ends are blocked 4. Sequencing primer hybridized to the DNA template 5. Libraries are ready for sequencing Cluster generation: Bound fragments are extended to make copies and reverse strands cleaved and washed away
  • 17. 9/13/2013 Wellstein/Riegel Laboratory 17 4 fluorescently labeled reversibly terminated nucleotides 1. Each base competes for addition 2. Natural competition ensures highest accuracy 3. After each round of synthesis, clusters are excited by a laser emitting a color that identifies the newly added base 4. Fluorescent label and blocking group are removed allowing for addition of next nucleotide 5. Proprietary (Illumina) chemistry reads a base in each cycle 6. Allows for accurate sequencing through difficult regions such as homopolymers and repetitive sequence Sequencing: 100s of millions of clusters sequenced simultaneously
  • 18. What was good for DNA is now good for RNA • Technology advances => higher throughput sequencing at lower costs • Whole Genome Sequencing has enabled • Whole Transcriptome Sequencing • Workflow for DNA sequencing and RNA sequencing is similar 9/13/2013 Wellstein/Riegel Laboratory 18
  • 19. There are other ways to Inquire about the Transcriptome • Array Based Technologies – Affymetrix – Agilent – Known genes and hybridization protocols • Microarray – 20,000+ array experiments on a single platform – Edge effects – False positives / false negatives • Bead-based arrays • Tiling arrays • SAGE 9/13/2013 Wellstein/Riegel Laboratory 19
  • 20. What is unique about RNA-Seq? • Allows you to discover and profile the entire transcriptome of any organism • No probes or primers to design • Novel transcripts • Novel isoforms • Alternative splice sites • Rare transcripts • cSNPS – all of this in one experiment 9/13/2013 Wellstein/Riegel Laboratory 20
  • 21. 9/13/2013 Wellstein/Riegel Laboratory 21 After sequencing… 1. Quality control – trim your reads 2. Count Reads • Align to genome • Align to transcriptome 3. Interpret Data • Statistical tests (differential expression analysis) • Visualization (mapped reads) • Pathway analysis Not so simple – big data, big compute requirements After sequencing, we must then perform RNA-Seq Data Analysis
  • 22. 9/13/2013 Wellstein/Riegel Laboratory 22 How much RNA-sequencing data? 1. 20 million paired end reads ~ 2 GB of data 2. 100 million paired end reads ~ 10 GB of data How much computation power? 1. More memory, more processors, less time it takes to compute 2. Outsource the analysis, still will need to store the results somewhere Amazon web services S3 storage EC elastic cloud on demand computational facility Georgetown University High Performance Computer Core matrix.georgetown.edu UPENN Galaxy services How much RNA-sequencing data, how much computation power and where do you go to compute?
  • 23. 9/13/2013 Wellstein/Riegel Laboratory 23 A growing number of tools enable RNA-Seq analysis
  • 24. These RNA-Seq tools are used for mapping reads, aligning reads and providing input for differential expression analysis • Tuxedo suite – Bowtie, Tophat, Cufflinks • Trinity Suite – Inchworm, chrysallis, butte rfly • RUM – RNA Unified Mapper 9/13/2013 Wellstein/Riegel Laboratory 24
  • 25. 9/13/2013 Wellstein/Riegel Laboratory 25 What percentage of reads are covered? What percentage of reads are mapped? 3’ Bias on transcript reads 1. 60-80% of reads are mapped 2. Highest percentage or 3’ end of reads are mapped 3. Reads need to be quality trimmed Mapping tools bias exons to known genes
  • 26. 9/13/2013 Wellstein/Riegel Laboratory 26 Galaxy is a web based tool committed to enable a researcher (more than just for RNA-Seq)
  • 28. How to visualize mapped results? • UCSC Genome Browser (Gbrowse) • Integrated Genome Browser (IGB) • Integrated Genome Viewer (IGV) Many shared formats, reading many of the outputs generated by the programs, ability to generate ones own tracks 9/13/2013 Wellstein/Riegel Laboratory 28
  • 31. What do RNA-Seq reads look like for GAPDH? Repeat masked allowing 1/2 mismatched bases blat’d reads viewed in IGB 6.7.2
  • 32. 9/13/2013 Wellstein/Riegel Laboratory 32 RNA-Seq Differential Expression analysis
  • 33. What does GAPDH look like in terms of quantitation? TOTAL BM HPP RPKM 3SEQ Counts BLAT Reads RPKM 3SEQ Counts BLAT Reads CD34 0.7 340 230 8 8 14 BST1 19.7 5374 31 31 CD133 0.2 173 176 16 16 33 THY1 0 7 4 4 A12 1 0 A5 0 0 ALK 0 9 24 0 0 3 B9 0 0 C1 0 0 C2 0 0 C7 0 0 E7 0 0 E9 2 0 F6 0 0 G12 0 0 GAPDH 3013.2 727831 356289 120.8 5559 2670 H3 0 0 Blat read raw counts ratio == 3Seq counts ratio ~= 130 to 1 RPKM ratio ~= 24.3
  • 34. 9/13/2013 Wellstein/Riegel Laboratory 34 RNA-Seq Quantification Challenge: A problem that exists with RNA-Seq data that doesn’t exist with array data: Longer transcripts produce more reads than shorter transcripts One solution to account for this is RPKM (FPKM used by Cufflinks) RPKM = 10^9 x C / NL, which is really just simply C/N C(gene)= the number of mappable reads that fall onto a gene's exons N= total number of mappable reads in the experiment L(gene)= the sum of the exons in base pairs. Wold (2008)
  • 35. 9/13/2013 Wellstein/Riegel Laboratory 35 Cufflinks: Transcript assembly, differential expression, and differential regulation for RNA-seq
  • 36. 9/13/2013 Wellstein/Riegel Laboratory 36 Cuffdiff produces many output files: 1. Transcript FPKM expression tracking. 2. Gene FPKM expression tracking; tracks the summed FPKM of transcripts sharing each gene_id 3. Primary transcript FPKM tracking; tracks the summed FPKM of transcripts sharing each tss_id 4. Coding sequence FPKM tracking; tracks the summed FPKM of transcripts sharing each p_id, independent of tss_id 5. Transcript differential FPKM. 6. Gene differential FPKM. Tests differences in the summed FPKM of transcripts sharing each gene_id 7. Primary transcript differential FPKM. Tests difference sin the summed FPKM of transcripts sharing each tss_id 8. Coding sequence differential FPKM. Tests difference sin the summed FPKM of transcripts sharing each p_id independent of tss_id 9. Differential splicing tests: this tab delimited file lists, for each primary transcript, the amount of overloading detected among its isoforms, i.e. how much differential splicing exists between isoforms processed from a single primary transcript. Only primary transcripts from which two or more isoforms are spliced are listed in this file. 10. Differential promoter tests: this tab delimited file lists, for each gene, the amount of overloading detected among its primary transcripts, i.e. how much differential promoter use exists between samples. Only genes producing two or more distinct primary transcripts (i.e. multi-promoter genes) are listed here. 11. Differential CDS tests: this tab delimited file lists, for each gene, the amount of overloading detected among its coding sequences, i.e. how much differential CDS output exists between samples. Only genes producing two or more distinct CDS (i.e. multi-protein genes) are listed here.
  • 37. 9/13/2013 Wellstein/Riegel Laboratory 37 RNA-Seq Quantification Challenge: DESeq Method uses the geometric mean of counts in all samples DESeq Method: Construct a "reference sample" by taking, for each gene, the geometric mean of the counts in all samples. To get the sequencing depth of a sample relative to the reference, calculate for each gene the quotient of the counts in your sample divided by the counts of the reference sample. Now you have, for each gene, an estimate of the depth ratio. Simply take the median of all the quotients to get the relative depth of the library. 'estimateSizeFactors' function of DESeq package does this calculation.
  • 38. DESeq: an R package that works with Raw Counts to determine genes differentially expressed across samples • Simon Anders 9/13/2013 Wellstein/Riegel Laboratory 38
  • 41. 9/13/2013 Wellstein/Riegel Laboratory 41 What is Systems Biology? Technology Advances Spurs Research Advances Systems Biology is a discipline using a multitude of measurement technologies to capture the entirety of a biological systems parts and then attempts to reverse engineer that biological system’s ability to dynamically remodel in its response to stimuli Sequencing technologies Mass Spec technologies Genome Transcriptome Proteome Metabolome
  • 42. Resources • http://dx.doi.org/10.1038/npre.2010.4282.1 (DESeq) • http://galaxy.psu.edu/ • http://seqanswers.com/ • http://www.broadinstitute.org/igv/ • http://bioviz.org/igb/index.html • http://www.illumina.com • http://www.otogenetics.com • http://www.dnanexus.com • http://cufflinks.cbcb.umd.edu/ • http://brb.nci.nih.gov/BRB-ArrayTools.html 9/13/2013 Wellstein/Riegel Laboratory 42
  • 43. 9/13/2013 Wellstein/Riegel Laboratory 43 Acknowledgements Dr. Anton Wellstein Dr. Anna Riegel Dr. Marcel Schmidt Jean-Baptiste Masarati Dr. Elena Tassi The entire lab: Tibari, Ghada, Ivana, Eveline, the entire Wellstein/Riegel laboratory My Committee Dr. Yuri Gusev Dr. Anatoly Dritschilo Dr. Michael Johnson Dr. Christopher Loffredo Dr. Habtom Ressom Dr. Terry Ryan (external committee member) High Performance Core Group, Steve Moore, especially Woonki Chung Amazon Cloud Services Dr. Ann Loraine, UNC, IGB Developer Brian Haas, Author Trinity Suite
  • 44. Given a list of differentially expressed Genes now enrichment analysis should be performed • Enrichment analysis allows the researcher to leverage documented experiments which provide evidence for genes roles in pathways and functions that enable the researcher to determine the results and significance of their experiments • DAVID – Gene ontology – Functional ontology • Revigo – Output of David may be placed in REVIGO for further interpretation and statistical exploration of significance of discovered sets of genes 9/13/2013 Wellstein/Riegel Laboratory 44
  • 45. Using differentially expressed genes, biological pathways should be explored • Differentially expressed genes are put into programs such as pathway studio or ingenuity • Shortest path programs and • Canonical pathway analysis • Enables a researcher to reverse engineer the pathways expressed in the course of a healthy response to a diseased response • Ideally a pathway reveals the observed phenotype – connecting the expressed gene expression program with the phenotype – genotype – gene expression program to phenotype 9/13/2013 Wellstein/Riegel Laboratory 45
  • 46. 9/13/2013 Wellstein/Riegel Laboratory 46 FGFBP1 pathways control after induction of a conditional transgene in a mouse model: Information derived from mRNA expression pattern analysis Anne Deslattes Mays, Elena Tassi, Anton Wellstein Department of Oncology and Medicine, Lombardi Cancer Center, Washington DC 20057 Abstract Fibroblast Growth Factors (FGFs) play a significant role in embryonic development, maintenance of tissue homeostasis in the adult as well as in different diseases. FGF-binding proteins (FGF-BP) are secreted proteins that chaperone FGFs stored in the extracellular matrix to their cognate receptor, and can thus modulate FGF signaling. FGF-BP1 (BP1 a.k.a. HBp17) expression is required for embryonic survival, can modulate FGF-dependent vascular permeability in embryos and is an angiogenic switch in human cancers. To determine the function of BP1 in vivo, we generated tetracycline-regulated conditional BP1 transgenic mice. BP1 expressing mice are viable, fertile and phenotypically indistinguishable from their littermates. Five cDNA Affymetrix arrays were run on the kidneys of the FGF-BP1 transgenic mice. Two arrays were run for the animals under doxycyclin diet with the transgene switched off, one array was run with induction of the FGF-BP1 transgene for 24 hours, one array was run with induction of the FGF-BP1 transgene for 336 hours representing a chronic induction of the transgene. The results indicate that when properly normalized, time series analysis of a large array can reveal the signal transduction pathways. Pattern analysis allows for a systems biology review of the data and allows for the exploration and generation of testable hypotheses. Figure 3 – Heatmap scaled by probe - After RMA normalization, selection of significant over and under expressors relative to the average of the FGFBP1 transgene being off, analysis of the heatmap reveals mutually exclusive clusters. These clusters indicate genes that are off from one state until the other. Cluster A represents those genes that are off with the FGFBP1 transgene being off and switched on when the FGFBP1 transgene is activated for 24 hours. Cluster B contains those genes that are off at 24 hours but activated when the FGFBP1 transgene is on for 48 hours. Cluster C contains those genes that are off at 48 hours but on when the FGFBP1 transgene is on for 336 hours – or chronically. Studying these genes in this order, and with this pattern, allows the exploration of the signal transduction and activation pathway in response to the activation of FGFBP1 transgene. A B C A B C Figure 5– Gene Details – The detail for the genes found in the clusters of Figure 1 are described above in tables A, B and C. The genes responding after activation of the FGFBP1 transgene for 24 hours includes immunoglobulin kappa chain variable 21, 3-phosphoglycerate dehydrogenazes, a zinc finger protein, neuroantin, and homeobox B8. The genes found in table B, represent those genes activated after 48 hours of the FGFBP1 transgene being on. Included in this set is the hemopexin and major urinary protein 3. Finally after 336 hours – truly representing chronic activation of the FGFBP1 transgene, we have one gene, Reg3b, associated with inflammatory response (according to GO ontology). Figure 2 – Distinct Expression Patterns When Filtering by Thresholds at Timepoints. By creating a filter to capture the distinctive patterns that are expressing themselves at each of the separate timepoints, One can understand the major message being communicated at each timep oint. The patterns of expression are distinctive. Panel A are the expression patterns for those genes above a threshold at 24 hours. Panel B are the expression patterns for those genes above a threshold at 48 hours and Panel C are the expression patterns for those genes above a threshold at 336 hours – or at a chronic transgene Expression level. Figure 4- FGFBP1 pathways – Using Pathway Studio, the shortest path through the set of genes that were selected from filtering by a band pass filter at each of the time points, 24 hours, 48 hours and 336 hours was constructed. The resulting selection of diseases, cell processes, and functional classes were the result of Pathway Studio constructing the shortest path to connect those genes in the set. Conclusions A systems biology approach to analyzing large data sets, such as this study which involved five full mouse cDNA arrays allows the researcher to capture a snapshot of the unfolding remodeling events of an organisms response to change, stress or disease. Analyzing data in this form involves filtering the biological signal from the noise. Sorting the noise in appropriate manners is essential to be able to complete the biological story. Building on existing knowledge base, we can complete the picture as long as the proper context of the collection, normalization and analysis is maintained. High throughput technologies such as microarrays and RNA sequencing as enabled by next generation sequencing presents the researcher with the challenge of extracting meaningful information from the measurements. Software tools and analysis techniques are not a substitute to understanding the biological context from which the data are collected. Engineering and digital signal processing has allowed us to derive the understanding of how to reconstruct a signal from the presence of a continual stream of noisy analog data. Sampling frequency and proper filtering are a must to be able to sort out a meaningful signal from the noise. These same principles apply not only to communication theory but also when studying large data such as those that may be collected from high throughput systems such as a Affymetrix mouse cDNA array. A B C 0 A B C Figure 1 Panels. 0, A, B, and C, illustrate ordering based upon the expression values of the control (FGFBP OFF), 24 hour expression (FGFBP1 On 24 hours), 48 hour expression (FGFBP1 On 48 hours), and 336 hour expression (FGFBP On 336 hours). The insight gained from this inspection includes the ability to see the relative changes of expression at each of these time points. Figure 6 – Graphical Gaussian Model. Using the expression profiles, a quassi-Bayesian analysis is performed constructing the partial correlation network among the top expressing genes. Note that C9 (complement component 9) was not able to be placed in context of the data in the Pathway Studio diagram, however using the partial correlations, we are able to place it as strongly positively correlated to Serpina3k, Cyp3all, MUG1, Tdo2, Mup3, Hpx, weakly positively correlated to Hamp, and strongly negatively correlated to Tex10. Together indicating the placement of C9 in the Endothelial response.
  • 47. Scientific knowledge is limited (and advanced) by the limits (and advancements) of measurement 9/13/2013 Wellstein/Riegel Laboratory 47 • Ilya Shmulevich Genomic Signal Processing “Validity of the model involves observation and measurement, scientific knowledge is limited by the limits of measurement” • Erwin Shrödinger Science Theory and Man: “It really is the ultimate purpose of all schemes and models to serve as scaffolding for any observations that are at all means observable”
  • 48. 9/13/2013 Wellstein/Riegel Laboratory 48 Before Library Construction 1. Most vendors and cores will assess the quality of the RNA before sequencing 2. Important to determine before sequencing begins Garbage – in == Garbage out Before library construction, RNA quality must be assessed
  • 49. 9/13/2013 Wellstein/Riegel Laboratory 49 Cluster Generation • Cbot cluster system single molecules are isothermally amplified in a flow cell to prepare them for high-throughput sequencing • 8 channel genome analyzer has a dense lawn of oligos • Single DNA molecules hybridize to the lawn of oligos • Bound fragments are extended to make copies • Copies covalently bound to the flowcells surface • Each fragment is clonally amplified through a series of extensions and isothermal bridge amplifications resulting in 100s millions of unique clusters • Reverse strands cleaved and washed away • Ends are blocked • Sequencing primer hybridized to the DNA template • After cluster generation, libraries are ready for sequencing
  • 50. 9/13/2013 Wellstein/Riegel Laboratory 50 Sequencing • 100s of millions of clusters sequenced simultaneously • Using 4 fluorescently labeled reversibly terminated nucleotides • Natural competition ensures highest accuracy • After each round of synthesis, clusters are excited by a laser emitting a color that identifies the newly added base • Fluorescent label and blocking group are then removed allowing for the addition of the next nucleotide • Proprietary chemistry (Illumina) reads a base in each cycle • Allows for accurate sequencing through difficult regions such as homopolymers and repetitive sequence
  • 52. Systems Biology History (wikipedia) • Systems biology roots found in – Quantitative modeling of enzyme kinetics – Mathematical modeling of population growth – Simulations to study neurophysiology – Control theory and cybernetics • Theorists – Ludwig von Bertalanffy – General Systems Theory – Alan Lloyd Hodgkin and Andrew Fielding Huxley – constructed a mathematical model that explained potential propagating along the axon of a neuron cell – Denis Nobel – first computer model of the heart Pacemaker 9/13/2013 Wellstein/Riegel Laboratory 52
  • 53. Institutes of Systems Biology • 2000 – Institutes of Systems Biology established in Seattle and Tokyo • After completion of Human Genome projects • NSF grand challenge for systems biology – build a mathematical model of the whole cell 9/13/2013 Wellstein/Riegel Laboratory 53

Editor's Notes

  1. “Nothing scientific can be said about a system for which no measurements are possible at the scale of the theory”Erwin ShrödingerScience Theory and Man: “It really is the ultimate purpose of all schemes and models to serve as scaffolding for any observations that are at all means observable”“It makes no sense to apply a mathematical method that either depends on or utilizes unobservable measurements”
  2. Capillary Gel Electrophoresis enabled the sequencing of the human genome faster, cheaper – spurred the completion of the human genome project
  3. Sample starts with total RNA,Message RNA purified by polyA selection and then Chemically fragmented and converted into sscDNA using random hexamer priming.Second strand generated to create double stranded cDNA. And then this is ready for the TruSeq Library Construction.Blunt-ended DNA fragments are generated using a combination of fill in reactions and exonnuclease activity.An “A” base is added to the blunt ends of each strand. Preparing them for ligation to the sequence adapters.
  4. TrueSeQ workflowBlunt end fragments created.An A base is addedPrepare for indexed adapter ligations.Final product created which is ready for applicfication either the cBot or the Cluster Station.Pooling strategy is applied to allow multiplexing on the HiSeQ 2000 by using these adapters…In this way the paired end sequencing can be performed, the tags are assigned to each strand, so strandedness is preserved.
  5. RNA Seq allows you to discover and profile the entire transcriptomeNo ProbesNo PrimersRNA Seq delivers unbiased, unparalleled information about the transcriptome.Simple Sequencing WorkflowIlluminas optimized TRUSeq RNA Sample Prep Kits.
  6. Tools
  7. Once you get started – than there are a number of tools that allow you to visualize and understand your data.
  8. I will talk more about these tools on Thursday – When I give a talk on RNASeq for the Systems Biology series. But lets go back to our particular problem