PAGODA is a computational method that uses pathways and gene sets to identify transcriptionally distinct subpopulations in single cell RNA-seq data in a robust manner. The presentation discusses applying PAGODA to human cortical cell data to identify known cell types. It also describes integrating PAGODA with other methods to study alternative splicing patterns and connect transcriptional heterogeneity to epigenetic heterogeneity. The overall aim is to provide insights into cell-type specific regulation and implications for neuropsychiatric diseases.
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Society for Neuroscience November 2017 - snDropseq scTHSseq talk
1. Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 1
Classifying and
characterizing
single cells using
transcriptional
and epigenetic
analysis
Jean Fan
Kharchenko Lab
Bioinformatics and Integrative Genomics PhD
Department of Biomedical Informatics
Harvard Medical School / Harvard University
2. Disclosure of financial conflicts of interest
None
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 2
3. Motivation: Characterize heterogeneity and
identify cell subpopulations
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 3
Greig LC, Woodworth MB, Galazo MJ, Padmanabhan H, Macklis JD. Molecular logic of neocortical projection neuron specification, development and diversity.
Nat Rev Neurosci. 2013;14(11):755-69.
NPCs
4. Technological advancements in single cell
sequencing enables scRNA-seq
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 4
Microfluidic Chips Droplet Microfluidics
1000s of genes in 100s and 100,000s of cells -> need computational methods
5. Talk Outline
◦ How can we identify transcriptional subpopulations in a way that is
robust and takes into consideration technical artefacts from single cell
RNA-seq?
◦ Beyond expression heterogeneity, how can we use single-cell RNA-seq
data to identify patterns of alternative splicing important to neuronal
development?
◦ How can we connect transcriptional heterogeneity to epigenetic
heterogeneity (accessibility)
◦ What insights can such integrative analysis provide about cell-type specific regulation and
neuro-psychiatric disease?
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 5
6. Talk Outline
◦ How can we identify transcriptional subpopulations in a way that is
robust and takes into consideration technical artefacts from single cell
RNA-seq?
◦ Beyond expression heterogeneity, how can we use single-cell RNA-seq
data to identify patterns of alternative splicing important to neuronal
development?
◦ How can we connect transcriptional heterogeneity to epigenetic
heterogeneity (accessibility)
◦ What insights can such integrative analysis provide about cell-type specific regulation and
neuro-psychiatric disease?
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 6
7. PAGODA (Pathway And Geneset
OverDispersion Analysis) uses pathways to
identify transcriptional subpopulations
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 7
Nature Methods 13, 241–244 (2016)
doi:10.1038/nmeth.3734
8. PAGODA intuition: Improve statistical
sensitivity by taking advantage of pathways
and gene sets
◦ Rather than relying on a few genes, look for broader patterns of variability
◦ Coordinated patterns of variability of genes linked to function/phenotype
== stronger signal -> increases statistical power
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 8
9. PAGODA intuition: Improve statistical
sensitivity by taking advantage of pathways
and gene sets
◦ Rather than relying on a few genes, look for broader patterns of variability
◦ Coordinated patterns of variability of genes linked to function/phenotype
== stronger signal -> increases statistical power
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 9
10. PAGODA intuition: Improve statistical
sensitivity by taking advantage of pathways
and gene sets
◦ Rather than relying on a few genes, look for broader patterns of variability
◦ Coordinated patterns of variability of genes linked to function/phenotype
== stronger signal -> increases statistical power
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 10
11. PAGODA overview: assess expression within
annotated pathways and de novo gene sets
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 11
12. PAGODA overview: assess expression within
annotated pathways and de novo gene sets
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 12
13. PAGODA overview: Identify pathways and
gene sets exhibiting coordinated over
dispersion
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 13
14. PAGODA overview: Remove redundancy
pathways and gene sets, and visualize
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 14
15. PAGODA overview: Remove redundancy
pathways and gene sets, and visualize
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 15
PAGODA leverages pathway annotations and de novo gene sets
to identify robust transcriptionally distinct subpopulations
16. Increasing throughput of single cell
sequencing requires lighter computational
solutions -> PAGODA2
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 16
github.com/hms-dbmi/pagoda2
17. Talk Outline
◦ How can we identify transcriptional subpopulations in a way that is
robust and takes into consideration technical artefacts from single cell
RNA-seq?
◦ Beyond expression heterogeneity, how can we use single-cell RNA-seq
data to identify patterns of alternative splicing important to neuronal
development?
◦ How can we connect transcriptional heterogeneity to epigenetic
heterogeneity (accessibility)
◦ What insights can such integrative analysis provide about cell-type specific regulation and
neuro-psychiatric disease?
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 17
18. PAGODA applied to human cortical cells
identifies and characterizes subpopulations
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 18
Xiaochang Zhang
Chris Walsh
19. PAGODA identifies known cell types in fetal
cortices confirmed by marker genes
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 19
20. PAGODA identifies known cell types in fetal
cortices confirmed by marker genes
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 20
21. PAGODA integrated with MISO identifies
alternative splicing in pure pooled single cells
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 21
22. PAGODA integrated with MISO identifies
alternative splicing in pure pooled single cells
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 22
Needs bulk
23. PAGODA integrated with MISO identifies
alternative splicing in pure pooled single cells
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 23
Needs bulk -> pool single cells
24. PAGODA identifies known cell types in fetal
cortices confirmed by marker genes
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 24
25. Pure pooled RGs vs neurons lend credence to
potential purity concerns with bulk CP vs. VZ
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 25
26. Pure pooled RGs vs neurons lend credence to
potential purity concerns with bulk CP vs. VZ
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 26
27. Pure pooled RGs vs neurons lend credence to
potential purity concerns with bulk CP vs. VZ
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 27
PAGODA enables generation of
pure in-silico mini-bulks
28. Talk Outline
◦ How can we identify transcriptional subpopulations in a way that is
robust and takes into consideration technical artefacts from single cell
RNA-seq?
◦ Beyond expression heterogeneity, how can we use single-cell RNA-seq
data to identify patterns of alternative splicing important to neuronal
development?
◦ How can we connect transcriptional heterogeneity to epigenetic
heterogeneity (accessibility)
◦ What insights can such integrative analysis provide about cell-type specific regulation and
neuro-psychiatric disease?
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 28
29. Integrative Single-Cell Analysis By
Transcriptional And Epigenetic States In
Human Adult Brain
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 29
Blue Lake
Brandon Sos
Song Chen
Kun Zhang
Just accepted into Nature Biotech!
30. Study overview: droplet based transcriptomics
and DNA accessibility assays from same tissues
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 30
31. Study overview: droplet based transcriptomics
and DNA accessibility assays from same tissues
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 31
32. snDrop-seq identified many neuronal subtypes
across cortical tissues based on gene
expression
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 32
Clustering with tSNE in PAGODA2
33. Study overview: droplet based transcriptomics
and DNA accessibility assays from same tissues
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 33
34. scTHS-seq identified many neuronal subtypes
across cortical tissues based on DNA
accessibility
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 34
35. snDrop-seq and scTHS-seq identified many
neuronal subtypes within the visual cortex
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 35
Visual Cortex
snDrop-seq
(expression)
scTHS-seq
(accessibility)
36. Integrative approach overview: predict
differential accessibility using differential
expression to refine scTHS-seq populations
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 36
37. Integrative approach overview: predict
differential accessibility using differential
expression to refine scTHS-seq populations
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 37
38. GBM model trained on Oli vs. Ast to learn
general feature importance
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 38
39. Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 39
40. Cell-types confirmed using marker genes
(promoter accessibility, gene expression, tissue
staining)
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 40
Promoter Accessibility
Gene Expression Spatial Localization
41. Cell-types confirmed using marker genes
(promoter accessibility, gene expression, tissue
staining)
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 41
RORB
RORBRORB
ExL4
ExL4
42. Study overview: pool within discovered
subpopulations to discover cell-type specific
properties
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 42
43. Integrative analysis enables identification of
cell-type specific TFs
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 43
44. Integrating GWAS implicates cell types in
neuro-related diseases
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 44
45. Summary
◦ PAGODA allows us to leverage pathway-level information to identify
transcriptional subpopulations from single cell RNA-seq
◦ Beyond expression heterogeneity, we can pool single-cell RNA-seq
data to create in-silico mini-bulks to identify patterns of alternative
splicing
◦ Integrative analysis of snDrop-seq and scTHS-seq data allows us to
connect transcriptional heterogeneity to epigenetic heterogeneity
(accessibility) and identify potentially important TFs and implicate cell
subtypes in disease using GWAS
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 45
46. Thanks and happy to take questions!
Kharchenko Lab
Peter Kharchenko
Joseph Herman
Nikolas Barkas
Ruslan Soldatov
Zhang Lab
Kun Zhang
Blue Lake
Brandon Sos
Song Chen
Chun Lab
Jerold Chun
Gwen Kaeser
Jean Fan / Kharchenko Lab / HMS DBMI - SfN 2017 46
Funding
Wu Lab
Catherine Wu
Lili Wang
Ken Livak
Shuqiang Li
Park Lab
Peter Park
Soo Lee
Semin Lee
SGI
Woong-yang Park
Hae-Ock Lee
Walsh Lab
Chris Walsh
Xiaochang Zhang
Find me online!
Web: http://JEF.works
Github: JEFworks
Twitter: @JEFworks
jeanfan@fas.harvard.edu
Many others
CZ Zhang
Angela Brooks
DAC
Nir Hacohen
Soumya Raychaudhuri
Rafael Irizarry
Notas del editor
Actually identify subpopulations
DCX = neuronal maturation marker
Previous FACs rely on just one marker
PAGODA builds on these error models
Rather than variability of genes,
coordinated variability of genes within a pathway or gene set
The general intuition…
you can image if I have
many cells one gene
red is high blue is low
PAGODA builds on these error models
Rather than variability of genes,
coordinated variability of genes within a pathway or gene set
The general intuition…
you can image if I have
many cells one gene
red is high blue is low
PAGODA builds on these error models
Rather than variability of genes,
coordinated variability of genes within a pathway or gene set
The general intuition…
you can image if I have
many cells one gene
red is high blue is low
After error modeling…
Explain green and orange
Red and green
split de novo and top section
Given annotations from MsigDB, GO, or other ontologies
we integrate the error models previously mentioned and use weighted PCA to capture the variability of a gene set in principle components
where weights are derived from our error modeling
because annotations are limited, we also derive ‘de novo’ gene sets based on correlated expression patterns we observe directly from the data
Capturing the patterns of variability
because annotations are limited, we also derive ‘de novo’ gene sets based on correlated expression patterns we observe directly from the data
We focus on the pathways and gene sets that exhibit significantly coordinated variability
Statistical significance of the λ1 eigenvalues obtained for each gene set was evaluated based on the Tracy-Widom F1 distribution F1(m,ne ), where m is the number of
genes in a given set s, and ne is the effective number of cells, determined to fit the distribution of the randomly sampled gene sets (containing the same number of genes as the actual gene sets).
But many pathways and gene sets share genes or show similar patterns of variability across cells
we further collapse these redundancies into pathway clusters
Ultimately finally providing a cell clustering
along with an interactive browser to explore these results
Label middle heatmap
But many pathways and gene sets share genes or show similar patterns of variability across cells
we further collapse these redundancies into pathway clusters
Ultimately finally providing a cell clustering
along with an interactive browser to explore these results
Label middle heatmap
We applied PAGODA to identify subpopulations
CLS
Look at known marker genes for interpretation.
Indeed, we've identified radial glials or mature neurons...
Instead of looking at gene expression, let's look at alternative splicing
Look at known marker genes for interpretation.
Indeed, we've identified radial glials or mature neurons...
Instead of looking at gene expression, let's look at alternative splicing
Chris Burge’s lab at MIT
Create in silico mini-bulks
Look at known marker genes for interpretation.
Indeed, we've identified radial glials or mature neurons...
Instead of looking at gene expression, let's look at alternative splicing
Sashimi plots
Bulk microdissection
See same trends
Reviewers were initially concerned about purity of bulk
Bulk microdissection
See same trends
Reviewers were initially concerned about purity of bulk