SlideShare a Scribd company logo
1 of 35
Download to read offline
A short introduction to single cell
RNA-seq analyses
Nathalie Vialaneix
January 17th, 2019 - Biopuces
Unité MIAT, INRA Toulouse
Sources
These slides have been made using previous presentations
from:
• Delphine Labourdette (LISBP) - diaporama
• Cathy Maugis (IMT) - diaporama
• Franck Picard (LBBE, Lyon) - diaporama
2
Simple description of single cell
datasets
3
10x Genomics Chromium
4
A few remarks
• barcoding is used to index each cell
• UMI are used to index each transcript and correct the
amplification bias during library preparation
• droplet technology does not allow for spike-ins (which
would be useful for normalization)
• droplets sometimes include duplicates or triplicates (more
frequent in cancer cells; estimated at ~0-10% of the
droplets, depending on the number of cells, it increases with
the number of cells )
• many other sc technologies (check Delphine’s slides)
5
Single-cell technologies
[Regev et al., 2017]
6
Why single cell?
From a statistical perspective…
From 10x Genomics
7
Standard analyses and tools
• normalization and dimension reduction
• clustering
• differential expression
can be performed using:
• the bioconductor workflow “single cell” (that uses
the packages scatter and scran)
• the all-in-one pipeline “seurat”
8
Description of datasets and
requests from project TregDiab
9
Datasets
• Count dataset (as produced by Claire) with n = 8, 273
cells and p = 27, 998 genes (Unique Molecular Identifier)
• Metadata:
• on cells: barcode (identifies the cell), group (IL15 or
IL2) and genotype (WT or KO)
• on genes: ENSEMBLE gene name and Gene name
• Frequency distribution of conditions over cells:
WT KO
IL15 2452 1609
IL2 2175 2037
The rest of the analysis will focus on cells coming from WT
samples.
10
Questions
1. On the whole population of cells (not taking into account
groups and genotypes), perform a typology of cells
(unsupervised clustering).
2. Identify markers (genes) that are specific of each cell type.
11
Data cleaning and normalization
12
Different steps of the normalization
1. Quality control of the cells: library size distribution, number of
expressed genes distribution, mitocondrial proportion distribution.
⇒ Atypical cells are removed from the analysis.
2. Cell cycle classification.
⇒ Only cells in G1 phase are used for the analysis.
3. Quality control of genes: average count distribution, number of
cells in which the gene is expressed.
⇒ Atypical genes (lowly expressed) are removed from the
analysis.
4. Normalization of cell specific biases: size factor to correct
library sizes are computed after a first (crude) clustering.
What has not been done: Doublet detection 13
Quality control of cells
14
Filtering low quality cells
• remove cells with low library size
• remove cells with a low number of expressed genes
• remove cells with a too large number of mitocondrial
genes
⇒ 4,282 remaining cells (out of 4,627 original cells)
15
Cell cycle classification
Cell cycle classification is performed using cyclone (R
package scran): based on a model that has been trained on
specific markers of cell cycles (for mouse and human) ⇒ only
cells in G1 phase are used in the analyses (to remove mitosis
effects)
IL15 IL2
G1 2027 1871
G2 148 99
S 84 53
16
Gene quality
distribution of high average log
expressed genes expression distribution
17
Filtering atypical genes
Removed non variable genes: 13, 629 with a variance equal to
0 (48.7%).
low expressed genes genes expressed in few cells
⇒ 10,418 remaining genes (out of 27,998 initial genes)
18
Normalization
Normalization is performed after similar cells have been
clustered together (based on the most expressed genes; R
package scater).
⇒ Scaling factors of library size are obtained (similar to
RNA-seq, one can even normalize the library size as in edgeR).
19
Dimension reduction and
clustering
20
Standard approach for exploratory analysis
• dimension reduction (PCA, nearest neighbors graphs…)
• visualization (PCA, or t-SNE based on PCA or on any
other dimension reduction)
• clustering
21
PCA (all genes)
22
t-SNE (perplexity: 50, R package scater)
23
What does t-SNE?
If cell expressions are noted x1, …, xn (n cells, xi is in Rp
), then
• compute a similaritly between samples with:
pi|j =
exp(−γ2
∥xi − xj∥2
)
∑
k̸=j exp(−γ2∥xk − xj∥2)
• search for representation in R2
, y1, …, yn with a similarity
between points in the new representation based on:
qi|j =
exp(−∥yi − yj∥2
)
∑
k̸=j exp(−∥yk − yj∥2)
• based on the minimization KL divergence between p and q
But: the objective function is not convex and the results are
very sensitive to γ (perplexity) and to the initialization
24
t-SNE: remarks
• t-SNE is good at representing local distances but not
global ones (non linear dimension reduction)
• the perplexity can change a lot the representation (no
good values found for this dataset)
• the population of cells seem very homogeneous and
not related to the genotype
(the same is observed on PCA projection)
How could we improve that? Use log / raw expression, base
the algorithm on PCA results, try a wider range of perplexity
values…?
25
Clustering
• extract t-SNE coordinates
• use HAC on those
Other approaches for clustering
• use a NN network + clustering of graph (Louvain
algorithm that optimizes the modularity)
• use other dimension reduction methods and perform any
clustering algorithm
⇒ results are different (visualization can even be extremely
different)
26
Clustering results
27
Conditions in clusters
28
Exploratory analysis of markers
automatic detection prior knowledge
29
Not too bad for some known markers…
30
General overview of sc models in
statistics
31
How bad is the situation in single cell data?
Overdispersion is mainly biological because diversity is high
between cells
32
Expression is a bursty process: zeros are biological
33
sc Differential Expression Analysis with ZINB
[Risso et al., 2018] - package zinbwave
For cell i, gene j in condition r, gene expression is modeled by:
Xijr ∼ πijrδ0 + (1 − πijr)NB(µijr)
Remaining problems:
• We are not really able to discriminate low expression from
no expression
• Estimation is hard (use of a Bayesian framework to
address this issue)
• a similar method exists for PCA [Durif et al., 2018]
34
References
Durif, G., Modolo, L., Mold, J., Lambert-Lacroix, S., and
Picard, F. (2018).
Probabilistic count matrix factorization for single
cell expression data analysis.
In Raphael, B. J., editor, Proceedings of Research in
Computational Biology (RECOMB 2018), volume 10812
of Lecture Notes in Computer Science, pages 254–255,
Paris, France. Springer.
Regev, A., Teichmann, S. A., Lander, E. S., Amit, I.,
Benoist, C., Birney, E., Bodenmiller, B., Campbell, P.,
Carninci, P., Clatworthy, M., Clevers, H., Deplancke, B.,
Dunham, I., Eberwine, J., Eils, R., Enard, W., Farmer, A.,
Fugger, L., Göttgens, B., Hacohen, N., Haniffa, M., 35

More Related Content

What's hot

Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1QIAGEN
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysisRamaJumwal2
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
Pi rna-ppt, sasmita behura
Pi rna-ppt, sasmita behuraPi rna-ppt, sasmita behura
Pi rna-ppt, sasmita behuraseemabehura
 
Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications1010Genome Pte Ltd
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...Jean Fan
 
Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Robert Bruce
 
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesDNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesQIAGEN
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAGRF_Ltd
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
Introduction to CpG island power point presentation
Introduction to CpG island power point presentationIntroduction to CpG island power point presentation
Introduction to CpG island power point presentationjkhdfhk
 

What's hot (20)

Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
Single-Cell Analysis - Powered by REPLI-g: Single Cell Analysis Series Part 1
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Gene expression profiling
Gene expression profilingGene expression profiling
Gene expression profiling
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Exome sequence analysis
Exome sequence analysisExome sequence analysis
Exome sequence analysis
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
SNP Genotyping Technologies
SNP Genotyping TechnologiesSNP Genotyping Technologies
SNP Genotyping Technologies
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Pi rna-ppt, sasmita behura
Pi rna-ppt, sasmita behuraPi rna-ppt, sasmita behura
Pi rna-ppt, sasmita behura
 
Transcriptome Analysis & Applications
Transcriptome Analysis & ApplicationsTranscriptome Analysis & Applications
Transcriptome Analysis & Applications
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
Spatial transcriptome profiling by MERFISH reveals sub-cellular RNA compartme...
 
Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.Pyrosequencing slide presentation rev3.
Pyrosequencing slide presentation rev3.
 
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and TechnologiesDNA Methylation: An Essential Element in Epigenetics Facts and Technologies
DNA Methylation: An Essential Element in Epigenetics Facts and Technologies
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
Introduction to CpG island power point presentation
Introduction to CpG island power point presentationIntroduction to CpG island power point presentation
Introduction to CpG island power point presentation
 

Similar to A short introduction to single-cell RNA-seq analyses

20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0Computer Science Club
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生ysuzuki-naist
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Aditya K G
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsajay301
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDivyanshGupta922023
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedJonathan Eisen
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)r-kor
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyQIAGEN
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2AdamCribbs1
 

Similar to A short introduction to single-cell RNA-seq analyses (20)

May workshop
May workshopMay workshop
May workshop
 
May 15 workshop
May 15  workshopMay 15  workshop
May 15 workshop
 
12 arrays
12 arrays12 arrays
12 arrays
 
12 arrays
12 arrays12 arrays
12 arrays
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生NAISTビッグデータシンポジウム - バイオ久保先生
NAISTビッグデータシンポジウム - バイオ久保先生
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
Efficient Reduced BIAS Genetic Algorithm for Generic Community Detection Obje...
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Microarrays
MicroarraysMicroarrays
Microarrays
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Arrays
ArraysArrays
Arrays
 
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from UnculturedMicrobial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
Microbial Phylogenomics (EVE161) Class 17: Genomes from Uncultured
 
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
RUCK 2017 김성환 R 패키지 메타주성분분석(MetaPCA)
 
Advances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell TechnologyAdvances and Applications Enabled by Single Cell Technology
Advances and Applications Enabled by Single Cell Technology
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2How to analyse bulk transcriptomic data using Deseq2
How to analyse bulk transcriptomic data using Deseq2
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Recently uploaded

Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 

A short introduction to single-cell RNA-seq analyses

  • 1. A short introduction to single cell RNA-seq analyses Nathalie Vialaneix January 17th, 2019 - Biopuces Unité MIAT, INRA Toulouse
  • 2. Sources These slides have been made using previous presentations from: • Delphine Labourdette (LISBP) - diaporama • Cathy Maugis (IMT) - diaporama • Franck Picard (LBBE, Lyon) - diaporama 2
  • 3. Simple description of single cell datasets 3
  • 5. A few remarks • barcoding is used to index each cell • UMI are used to index each transcript and correct the amplification bias during library preparation • droplet technology does not allow for spike-ins (which would be useful for normalization) • droplets sometimes include duplicates or triplicates (more frequent in cancer cells; estimated at ~0-10% of the droplets, depending on the number of cells, it increases with the number of cells ) • many other sc technologies (check Delphine’s slides) 5
  • 7. Why single cell? From a statistical perspective… From 10x Genomics 7
  • 8. Standard analyses and tools • normalization and dimension reduction • clustering • differential expression can be performed using: • the bioconductor workflow “single cell” (that uses the packages scatter and scran) • the all-in-one pipeline “seurat” 8
  • 9. Description of datasets and requests from project TregDiab 9
  • 10. Datasets • Count dataset (as produced by Claire) with n = 8, 273 cells and p = 27, 998 genes (Unique Molecular Identifier) • Metadata: • on cells: barcode (identifies the cell), group (IL15 or IL2) and genotype (WT or KO) • on genes: ENSEMBLE gene name and Gene name • Frequency distribution of conditions over cells: WT KO IL15 2452 1609 IL2 2175 2037 The rest of the analysis will focus on cells coming from WT samples. 10
  • 11. Questions 1. On the whole population of cells (not taking into account groups and genotypes), perform a typology of cells (unsupervised clustering). 2. Identify markers (genes) that are specific of each cell type. 11
  • 12. Data cleaning and normalization 12
  • 13. Different steps of the normalization 1. Quality control of the cells: library size distribution, number of expressed genes distribution, mitocondrial proportion distribution. ⇒ Atypical cells are removed from the analysis. 2. Cell cycle classification. ⇒ Only cells in G1 phase are used for the analysis. 3. Quality control of genes: average count distribution, number of cells in which the gene is expressed. ⇒ Atypical genes (lowly expressed) are removed from the analysis. 4. Normalization of cell specific biases: size factor to correct library sizes are computed after a first (crude) clustering. What has not been done: Doublet detection 13
  • 14. Quality control of cells 14
  • 15. Filtering low quality cells • remove cells with low library size • remove cells with a low number of expressed genes • remove cells with a too large number of mitocondrial genes ⇒ 4,282 remaining cells (out of 4,627 original cells) 15
  • 16. Cell cycle classification Cell cycle classification is performed using cyclone (R package scran): based on a model that has been trained on specific markers of cell cycles (for mouse and human) ⇒ only cells in G1 phase are used in the analyses (to remove mitosis effects) IL15 IL2 G1 2027 1871 G2 148 99 S 84 53 16
  • 17. Gene quality distribution of high average log expressed genes expression distribution 17
  • 18. Filtering atypical genes Removed non variable genes: 13, 629 with a variance equal to 0 (48.7%). low expressed genes genes expressed in few cells ⇒ 10,418 remaining genes (out of 27,998 initial genes) 18
  • 19. Normalization Normalization is performed after similar cells have been clustered together (based on the most expressed genes; R package scater). ⇒ Scaling factors of library size are obtained (similar to RNA-seq, one can even normalize the library size as in edgeR). 19
  • 21. Standard approach for exploratory analysis • dimension reduction (PCA, nearest neighbors graphs…) • visualization (PCA, or t-SNE based on PCA or on any other dimension reduction) • clustering 21
  • 23. t-SNE (perplexity: 50, R package scater) 23
  • 24. What does t-SNE? If cell expressions are noted x1, …, xn (n cells, xi is in Rp ), then • compute a similaritly between samples with: pi|j = exp(−γ2 ∥xi − xj∥2 ) ∑ k̸=j exp(−γ2∥xk − xj∥2) • search for representation in R2 , y1, …, yn with a similarity between points in the new representation based on: qi|j = exp(−∥yi − yj∥2 ) ∑ k̸=j exp(−∥yk − yj∥2) • based on the minimization KL divergence between p and q But: the objective function is not convex and the results are very sensitive to γ (perplexity) and to the initialization 24
  • 25. t-SNE: remarks • t-SNE is good at representing local distances but not global ones (non linear dimension reduction) • the perplexity can change a lot the representation (no good values found for this dataset) • the population of cells seem very homogeneous and not related to the genotype (the same is observed on PCA projection) How could we improve that? Use log / raw expression, base the algorithm on PCA results, try a wider range of perplexity values…? 25
  • 26. Clustering • extract t-SNE coordinates • use HAC on those Other approaches for clustering • use a NN network + clustering of graph (Louvain algorithm that optimizes the modularity) • use other dimension reduction methods and perform any clustering algorithm ⇒ results are different (visualization can even be extremely different) 26
  • 29. Exploratory analysis of markers automatic detection prior knowledge 29
  • 30. Not too bad for some known markers… 30
  • 31. General overview of sc models in statistics 31
  • 32. How bad is the situation in single cell data? Overdispersion is mainly biological because diversity is high between cells 32
  • 33. Expression is a bursty process: zeros are biological 33
  • 34. sc Differential Expression Analysis with ZINB [Risso et al., 2018] - package zinbwave For cell i, gene j in condition r, gene expression is modeled by: Xijr ∼ πijrδ0 + (1 − πijr)NB(µijr) Remaining problems: • We are not really able to discriminate low expression from no expression • Estimation is hard (use of a Bayesian framework to address this issue) • a similar method exists for PCA [Durif et al., 2018] 34
  • 35. References Durif, G., Modolo, L., Mold, J., Lambert-Lacroix, S., and Picard, F. (2018). Probabilistic count matrix factorization for single cell expression data analysis. In Raphael, B. J., editor, Proceedings of Research in Computational Biology (RECOMB 2018), volume 10812 of Lecture Notes in Computer Science, pages 254–255, Paris, France. Springer. Regev, A., Teichmann, S. A., Lander, E. S., Amit, I., Benoist, C., Birney, E., Bodenmiller, B., Campbell, P., Carninci, P., Clatworthy, M., Clevers, H., Deplancke, B., Dunham, I., Eberwine, J., Eils, R., Enard, W., Farmer, A., Fugger, L., Göttgens, B., Hacohen, N., Haniffa, M., 35