SlideShare una empresa de Scribd logo
1 de 60
Descargar para leer sin conexión
Studying the 3D structure of the genome
The case study of P. falciparum
Nelle Varoquaux
0 / 51
The 3D structure of the genome is thought to play an
important role in many biological processes
The genome of S. cerevisiae is highly organized [Zimmer and Fabre, 2011]
1 / 51
P. falciparum’s complex life cycle.
Trophozoite
Ring
Schizont
Gametocyte
1. Mosquito injects
sporozoites 2. Sporozoites infect
liver cells
3. Merozoites are release
in blood stream
4. Erythrocytic cycle
Sporozoite
8. Sporozoites
develop.
5. Sexual cycle
6. Transmission
to mosquito
7. Oocytes are
formed
Merozoite
Oocyte
The lifecycle P. falciparum
2 / 51
The Hi-C protocol identifies physical contacts
between pairs of loci genome-wide
Hi-C paves the way for a systematic and genome-wide analysis of genome architecture [Rao et al., 2014]
3 / 51
The contact count matrix recapitulates the hallmarks
of genome architecture
I
II
III
IV
V
I II III IV V
centromereschromosomes boundaries
high
low
Contact counts for the first 5 chromosomes of S. cerevisiae
4 / 51
HiC-Pro: from sequences to contact maps
Reads
Alignement
Reads
Alignement
Reads Pairing
Assignment to Restriction Fragment
Combine Multiple Samples
Raw Contact Maps
Normalized Contact Maps
Hi-C Fragment
PE
Sequencing
Singleton
MultiHits
Low MAPQ
Dangling End Valid Pairs Self Circle
Unmapped
Dumped Pairs
Parallel Computing
Phasing
Data
Assignment to Parental Genome
Select Allele-specific Pairs
5 / 51
A wide range of applications for Hi-C data
Studying the 3D structure of the genome: 3D structure inference,
data integration with gene expression profiles, replication timing, . . .
6 / 51
A wide range of applications for Hi-C data
Studying the 3D structure of the genome: 3D structure inference,
data integration with gene expression profiles, replication timing, . . .
Studying the 1D structure of the genome: genome assembly,
metagenomic deconvolution, haplotype resolution, . . .
6 / 51
A wide range of applications for Hi-C data
Studying the 3D structure of the genome: 3D structure inference,
data integration with gene expression profiles, replication timing, . . .
Studying the 1D structure of the genome: genome assembly,
metagenomic deconvolution, haplotype resolution, . . .
Genome annotation: identifying rDNA clusters, centromeres,
specific gene families, . . .
I
II
III
IV
V
I II III IV V
centromereschromosomes boundaries
high
low
6 / 51
How to create 3D models from contact maps and how
to use them
• Data-driven methods
Consensus methods
Ensemble methods
Single-cell Hi-C based methods
• Model-driven methods
• Downstream analysis using 3D models: a highlight of the study of
P. falciparum’s 3D structure
7 / 51
Data-driven methods to build 3D models from
contact maps.
8 / 51
Inferring 3D models of genome architecture from
Hi-C data
Motivation
• Little is known on the 3D structure of the genome.
• Hi-C now allows to assess physical interactions genome-wide.
Goal Propose a robust and accurate method for inferring 3D models of
genome architecture from contact counts.
Idea Model contact counts as Poisson random variables where the 3D
model is a latent variable and cast the inference as maximizing the
likelihood.
9 / 51
Reconstructing the 3D structure from contact counts
maps
Consensus approach
Contact counts
Structure
Inference
[Duan et al., 2010, Tanizawa et al., 2010, Bau et al., 2011, Zhang
et al., 2013, Ben-Elazar et al., 2013, Lesne et al., 2014]
Ensemble approach
Contact counts
Population
of Structures
Inference
[Rousseau et al., 2011, Hu et al., 2013, Kalhor et al., 2011, Tjong
et al., 2012] 10 / 51
Chromosomes as a series of beads
• Let X ∈ Rn×3
be the coordinates of each bead.
• Let C ∈ Rn×n
be the raw contact count matrix.
• Let Θ the count-to-distance transfer function.
Optimization problem
minimize
x1,...,xn
σ(X, C)
11 / 51
Data-driven methods to build 3D models from
contact maps.
Consensus methods
12 / 51
Metric MDS-based methods
Formulation
minimize
x1,...,xn
σ(X, C) =
i,j|cij 0
wij( xi − xj 2 − Θ(cij))2
s.t. some constraints
• X : 3D coordinates
• C : normalized contact counts.
• wij are weights (set to 1
Θ(cN
ij
)2
in pastis-MDS2)
• Θ(c) = βcα: count-to-distance function
13 / 51
Relationships between contact counts c, genomic
distances s and Euclidean distances d
Fractal globule
• c ∼ s−1
• d ∼ s1/3
Equilibrium
• c ∼ s−3/2
• d ∼ s1/2
for s < s2/3
max
Relationship between contact counts and Euclidean distances
dij = γc−1/3
ij
,
14 / 51
What have other people done?
Paper Organism
Full or Counts-to-distance
Constraints wijpartial mapping
Adj. rDNA C T
Dekker et al. [2002] S. cerevisiae Chr III Worm-like chain behavior
Duan et al. [2010] S. cerevisiae Whole-genome Linear   
Tanizawa et al. [2010] S. pombe Whole-genome FISH-derived    
Ay et al. [2014] P. falciparum Whole-genome Fractal globule derived  δ2
ij
Peng et al. [2013] P. falciparum Whole-genome Fractal globule derived  δ2
ij
Lesne et al. [2014] Whole-genome ad hoc
Varoquaux et al. [2014] Whole-genome polymer-behaviour δ2
ij
Differences between MDS-based methods
15 / 51
Non-metric MDS-based method
Contact counts
Structure
θ
Distances
Contact counts
Infer
Transfer function
The count-to-distance transfer
function relies on strong
assumption about the physics of
DNA.
NMDS infers the transfer
function jointly with the
structure.
16 / 51
Nonmetric MDS-based method (parametric)
Hypothesis The mapping function can be parametrized:
Θ(c) = βd−α
(1)
Optimization
1. Set α and β, optimize the structure X.
2. Optimize in α and β with respect to a certain criterion
[Zhang et al., 2013]
17 / 51
Nonmetric MDS-based method (nonparametric)
Hypothesis If two loci i and j are observed to be in contact more often
than loci k and , then i and j should be closer in 3D space than k and
cij ≥ ck ⇔ xi − xj 2 ≤ xk − x 2 (2)
Objective function
minimize
x1,...,xn,Θ
σ(X, C, Θ))
subject to Θ decreasing
[Varoquaux et al., 2014, Ben-Elazar et al., 2013]
18 / 51
Statistical approaches for inferring the 3D structure
of the genome
• MDS-based methods minimize an arbitrary stress function that
measures the discrepancy between wish distances and 3D distances
of the model.
Statistical approach for stable inference of genome structure
• replace the arbitrary MDS loss function with a better-motivated
likelihood function
• define a probabilistic model of contact counts parametrized by the 3D
model.
19 / 51
A Poisson model for inferring the 3D structure of the
genome
The idea
Let’s assume that cij ∼ Poisson(βdα
ij
), where cij is the interaction count, dij
the pairwise Euclidean distance, and β and α unknown parameters.
Likelihood
(X, α, β) =
ij≤n
(βdα
ij
)cij
cij!
exp(−βdα
j )
Optimization problem
minimize
x1,...,xn,α,β
σ(X, C, α, β) = −log( (X, C, α, β))
[Varoquaux et al., 2014]
20 / 51
Modeling overdispersion using Negative Binomial
The idea Let’s assume that c ∼ NegativeBinomial(βdα, r), where c is the
interaction count, d the pairwise euclidean distance, r the dispersion
parameter, α unknown parameters, and β a scale coefficient.
Likelihood
(X, C) =
i,j
Γ(cij + r)
Γ(cij + 1)Γ(r)
(
βdα
ij
r + βdα
ij
)cij
(1 −
βdα
ij
r + dα
ij
)r
(3)
The optimization problem
max
α,β,X
L(X, α, β) =
ij≤n
cijα log dij − (cij + r) log(r + βdα
ij
) (4)
21 / 51
Data-driven methods to build 3D models from
contact maps.
Ensemble methods as a mean to infer population of structures
22 / 51
Sampling local minima of ill-defined optimization
problems
• Model chromosomes as a series of beads, linked by restraining
oscillators;
• Add strengths to the oscillators:
• Adjacent beads are neither too close on too far from one another;
• Strengths derived from contact counts;
• Run the optimization 50000 times.
Umbarger et al. [2011], Bau et al. [2011], Kalhor et al. [2011]
23 / 51
Estimating the posterior distribution of a statistical
model
• Define the statistical model: cij ∼ N(βdα
ij
, σ2
);
• Add priors (usually, non-informative priors);
• Sample using MCMC.
24 / 51
Conclusion
• Wide variety of methods;
• How to compare and evaluate?
• Using simulated data;
• By assessing stability and robustness with respect to:
• data bootstrapping or subsampling;
• contact map resolution;
• between biological replicates;
• By comparing to other sources of data.
25 / 51
The art of modeling the genome
26 / 51
The art of modeling the genome
Goal
Consider 10000 of randomly placed beads. Can you find the
smallest set of constraints such that these beads interact overall
the same way as a given contact map?
Challenge
While these models provide mechanistical insights, these are
difficult to build: each organism, cell-type, and time point require
hand crafted sets of constraints
27 / 51
Building a yeast nucleus
Modeling Chromosomes are modeled as flexable, volume-excluded
polymers.
Constraints
(i) the chromosomes are constrained into a spherical ball
representing the nucleus;
(ii) centromeres are constrained into a spherical ball tethered to the
nuclear membrane;
(iii) telomeres are tethered to the nuclear membrane;
(iv) rDNA is constrained into the nucleolus, represented as a
spherical ball opposite to the centromeres.
[Tjong et al., 2012, Tokuda et al., 2012]
28 / 51
Building a yeast nucleus
29 / 51
Building a P. falciparum nucleus?
Question Could we similarly build a P. falciparum nucleus?
Chr 7
Chr7
A
VE
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
Observed/Expected
Chr 7
Chr7
B
HiC
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
Observed/Expected
30 / 51
Building a P. falciparum nucleus?
Question Could we similarly build a P. falciparum nucleus?
And if we add constraints on var genes. . .
The optimization never converges to a structure that fully matches the
wished constraints.
30 / 51
Conclusion
• Model-driven approaches provide mechanistical insights on the
folding properties of genome. . .
• But each organism, cell-line and possibly time-points require their
own set of handcrafted constraints.
31 / 51
How can we use 3D models to gain biological
insights?
The case study of P. falciparum
32 / 51
The life cycle of P. falciparum
Trophozoite
Ring
Schizont
Gametocyte
1. Mosquito injects
sporozoites 2. Sporozoites infect
liver cells
3. Merozoites are release
in blood stream
4. Erythrocytic cycle
Sporozoite
8. Sporozoites
develop.
5. Sexual cycle
6. Transmission
to mosquito
7. Oocytes are
formed
Merozoite
Oocyte
The lifecycle P. falciparum
33 / 51
A wide range of data sets available
Name Strain Stage Resolution
Number Perc Perc
Reference
of contacts of cis of trans
Ay-rings 3D7 Late Rings 10 kb 16711552 43% 57% Ay et al. [2014]
Ay-trophozoites 3D7 Trophozoites 10 kb 56348498 53% 47% Ay et al. [2014]
Ay-schizonts 3D7 Schizonts 10 kb 11652832 55% 45 % Ay et al. [2014]
Lemieux-A4+ IT/BC6+ Rings 25 kb 18488252 19% 81% Lemieux et al. [2013]
Lemieux-A4 IT/3G8 Rings 25 kb 19674672 28% 72% Lemieux et al. [2013]
Lemieux-A44 IT/BC6- Rings 25 kb 18660594 25% 75% Lemieux et al. [2013]
Lemieux-DCJ On NF54/DCJ on Rings 25 kb 3098370 26% 74% Lemieux et al. [2013]
Lemieux-DCJ Off NF54/DCF Off Rings 25 kb 2533470 26% 73% Lemieux et al. [2013]
Lemieux-B15C2 NF54/B15C2 Rings 25 kb 1022996 12% 88% Lemieux et al. [2013]
Summary of the available P. falciparum Hi-C datasets
34 / 51
We can infer structures at different timepoints. . .
35 / 51
How can we use 3D models to gain biological
insights?
Visual inspection, structure stability, clustering and other
variance analysis
36 / 51
Visual inspection, structure stability, clutsering and
other variance analysis. . .
• Are the resulting 3D models closer across timepoints than across
initialization?
37 / 51
Visual inspection, structure stability, clutsering and
other variance analysis. . .
• Are the resulting 3D models closer across timepoints than across
initialization?
• Are the model locally consistent?
37 / 51
Visual inspection, structure stability, clutsering and
other variance analysis. . .
• Are the resulting 3D models closer across timepoints than across
initialization?
• Are the model locally consistent?
• How do the structures differ?
37 / 51
Visual inspection, structure stability, clutsering and
other variance analysis. . .
• Are the resulting 3D models closer across timepoints than across
initialization?
• Are the model locally consistent?
• How do the structures differ? Are the hallmarks of interest conserved?
37 / 51
Are the resulting 3D models closer across timepoints
than across initialization?
0.50 0.25 0.00 0.25 0.50 0.75 1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
Firstcomponent
Second component
0.50 0.25 0.00 0.25 0.50 0.75
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
Third component
Ay-Rin
Ay-Sch
Ay-Trop
Lemieux-Rin
PCA analysis on features extract from the 3D structures of each timepoints.
38 / 51
Are models locally consistent?
Are model locally consistent?
• Divide structures into overlapping sub-structures (5 to 20 beads)
• Compute the pairwise RMSD between segments of all
structures.
• Segments overlapping across many structures can be labeled
“consistent”
39 / 51
Are the hallmarks of the genome architecture
conserved?
• Identify the hallmarks of interest: loci that are thought to colocalize
from a set S
• Perform 3D gene set enrichment
• Define a distance threshold to seperate pairs as “close” or “far”
• Compute a test-statistic: the number of genes of the set that are
labeled as “close”.
• Compare the test statistic with a null distribution estimated by
resampling with preservation of chromosome structure.
Conclusion In P. falciparum, centromeres, telomeres and var genes
colocalize.
40 / 51
How can we use 3D models to gain biological
insights?
Integrative analysis of gene expression and 3D structure
41 / 51
Identifying links between gene expression profiles
and 3D structure
Motivation: Extract a gene expression profile v ∈ Rp
that is:
• representative of the gene expression profiles ;
• correlated with the 3D structure;
Data: For each gene g ∈ G
• Log expression profiles at 27 datapoints:
e(g) = (e1(g), . . . , ep(g)) ∈ Rp
.
• Gene’s 3D coordinates, extracted from the inferred 3D structure: x(g).
Method: KernelCCA [Vert and Kanehisa, 2003, Bach and Jordan, 2002]
42 / 51
Extracting a vector v representative of the gene
expression profiles
Find v ∈ Rp
to maximize:
V(v) =
g∈G vT
e(g)
2
v 2
43 / 51
Find f such that f is smooth with respect to the 3D
structure
Let f be a vector of scores assigned to each genes.
S(f) =
f K−1
3D
f
||f||2
We want:
• V(v) be large,
• S(f) be small,
• (v e(g))g∈G and f be as correlated as possible
This can be cast as a generalized eigenvalue problem
44 / 51
KernelCCA reveals a strong correlation between
gene expression profiles and 3D structure
Gametocyte Sporozoite
A
B
2.0
-2.0
**
*
**
*
** *
*
*
* *
*
**
*
*
*
*
* * **
*
**
**
*
var rif Exported Gametocyte-specific Invasion
−1
0
1
2
3
Geneexpression1
Geneexpression2
Structure1
Structure2
Geneexpression1
Geneexpression2
Structure1
Structure2
Geneexpression1
Geneexpression2
Structure1
Structure2
Geneexpression1
Geneexpression2
Structure1
Structure2
Geneexpression1
Geneexpression2
Structure1
Structure2
Component
Meanz-score
Stage
IDC
Gam
Spz
45 / 51
Conclusion
• A plethora of data-driven methods are available.
• Only a few are publicly available and well validated.
• How to choose which is still an open and active research questions.
• Model driven methods are organism and timepoints specific.
• But provide powerful mechanistical insights in the genome architecture
• May be a means to replace high throughput FISH experiments at low
cost as preliminary explorations.
• Downstream analysis of P. falciparum provided important insights in
the relationship between gene expression and 3D structure.
• But meaningful comparison of models from different time points at
genome-scale is still an open problem.
46 / 51
Acknowledgments
Mines ParisTech - CBIO
Jean-Philippe Vert
Thomas Walter
UW - Noble lab
William S. Noble
Ferhat Ay
Institut Curie - U900
Emmanuel Barillot
Nicolas Servant
Chong-Jian Chen
Eric Viara
UMass
Job Dekker
Bryan Lajoie
UC Riverside
Karine Le Roch
Sebastiaan Bol
Evelien Bunnik
Jacques Prudhomme
Institut Curie - UMR168
Edith Heard
UW - Dunham lab
Maitreya Dunham
Ivan Liachko
UW - Shendure lab
Jay Shendure
Josh Burton
47 / 51
References I
F. Ay, E. M. Bunnik, N. Varoquaux, S. M. Bol, J. Prudhomme, J.-P. Vert,
W. S. Noble, and K. G. Le Roch. Three-dimensional modeling of the P.
falciparum genome during the erythrocytic cycle reveals a strong
connection between genome architecture and gene expression.
Genome Research, 24:974–988, 2014.
F. R. Bach and M. I. Jordan. Kernel independent component analysis.
Journal of Machine Learning Research, 3:1–48, 2002.
D. Bau, A. Sanyal, B. R. Lajoie, E. Capriotti, M. Byron, J. B. Lawrence,
J. Dekker, and M. A. Marti-Renom. The three-dimensional folding of the
α-globin gene domain reveals formation of chromatin globules. Nat
Struct Mol Biol, 18(1):107–114, 2011.
S. Ben-Elazar, Z. Yakhini, and I. Yanai. Spatial localization of co-regulated
genes exceeds genomic gene clustering in the saccharomyces
cerevisiae genome. Nucleic Acids Res, 41(4):2191–2201, Feb 2013.
47 / 51
References II
J. Dekker, K. Rippe, M. Dekker, and N. Kleckner. Capturing chromosome
conformation. Science, 295(5558):1306–1311, 2002.
Z. Duan, M. Andronescu, K. Schutz, S. McIlwain, Y. J. Kim, C. Lee,
J. Shendure, S. Fields, C. A. Blau, and W. S. Noble. A three-dimensional
model of the yeast genome. Nature, 465:363–367, 2010.
M. Hu, K. Deng, Z. Qin, J. Dixon, S. Selvaraj, J. Fang, B. Ren, and J. S.
Liu. Bayesian inference of spatial organizations of chromosomes. PLoS
Comput Biol, 9(1):e1002893, 2013.
R. Kalhor, H. Tjong, N. Jayathilaka, F. Alber, and L. Chen. Genome
architectures revealed by tethered chromosome conformation capture
and population-based modeling. Nat Biotechnol, 30(1):90–98, 2011.
J. E. Lemieux, S. A. Kyes, T. D. Otto, A. I. Feller, R. T. Eastman, R. A.
Pinches, M. Berriman, X. Z. Su, and C. I. Newbold. Genome-wide
profiling of chromosome interactions in Plasmodium falciparum
characterizes nuclear architecture and reconfigurations associated with
antigenic variation. Mol Microbiol, 90(3):519–537, 2013.
48 / 51
References III
A. Lesne, J. Riposo, P. Roger, A. Cournac, and J. Mozziconacci. 3D
genome reconstruction from chromosomal contacts. Nature Methods,
11(11):1141–1143, 2014.
Cheng Peng, Liang-Yu Fu, Peng-Fei Dong, Zhi-Luo Deng, Jian-Xin Li,
Xiao-Tao Wang, and Hong-Yu Zhang. The sequencing bias relaxed
characteristics of Hi-C derived data and implications for chromatin 3D
modeling. Nucleic Acids Res., 41(19):e183, 2013.
S. S. P. Rao, M. H. Huntley, N. Durand, C. Neva, E. K. Stamenova, I. D.
Bochkov, J. T. Robinson, A. L. Sanborn, I. Machol, A. D. Omer, E. S.
Lander, and E. L. Aiden. A 3D map of the human genome at kilobase
resolution reveals principles of chromatin v looping. Cell, 59(7):
1665–1680, 2014.
M. Rousseau, J. Fraser, M. Ferraiuolo, J. Dostie, and M. Blanchette.
Three-dimensional modeling of chromatin structure from interaction
frequency data using Markov chain Monte Carlo sampling. BMC
Bioinformatics, 12(1):414, October 2011. ISSN 1471-2105.
49 / 51
References IV
H. Tanizawa, O. Iwasaki, A. tanaka, J. R. Capizzi, P. Wickramasignhe,
M. Lee, Z. Fu, and K. Noma. Mapping of long-range associations
throughout the fission yeast genome reveals global genome
organization linked to transcriptional regulation. Nucleic Acids Res, 38
(22):8164–8177, 2010.
H. Tjong, K. Gong, L. Chen, and F. Alber. Physical tethering and volume
exclusion determine higher-order genome organization in budding
yeast. Genome Res, 22(7):1295–1305, 2012.
Naoko Tokuda, Tomoki P Terada, and Masaki Sasai. Dynamical modeling
of three-dimensional genome organization in interphase budding yeast.
Biophys. J., 102(2):296–304, 2012.
M. A. Umbarger, E. Toro, M. A. Wright, G. J. Porreca, D. Bau, S. Hong,
M. J. Fero, L. J. Zhu, M. A. Marti-Renom, H. H. McAdams, L. Shapiro,
J. Dekker, and G. M. Church. The three-dimensional architecture of a
bacterial genome and its alteration by genetic perturbation. Molecular
Cell, 44:252–264, 2011.
50 / 51
References V
N. Varoquaux, F. Ay, W. S. Noble, and J.-P. Vert. A statistical approach for
inferring the 3D structure of the genome. Bioinformatics, 30(12):i26–i33,
2014.
J.-P. Vert and M. Kanehisa. Graph-driven features extraction from
microarray data using diffusion kernels and kernel CCA. In Advances in
Neural Information Processing Systems 15, pages 1425–1432,
Cambridge, MA, 2003. MIT Press.
Z. Zhang, G. Li, K.-C. Toh, and W.-K. Sung. Inference of spatial
organizations of chromosomes using semi-definite embedding approach
and Hi-C data. In Proceedings of the 17th International Conference on
Research in Computational Molecular Biology, volume 7821 of Lecture
Notes in Computer Science, pages 317–332, Berlin, Heidelberg, 2013.
Springer-Verlag.
C. Zimmer and E. Fabre. Principles of chromosomal organization: lessons
from yeast. Journal of Cell Biology, 192(5):723–733, 2011.
51 / 51
51 / 51

Más contenido relacionado

Similar a Studying the 3D structure of the genome: the case study of P. falciparum

Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Anthony Parziale
 
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Abdelrahman Hosny
 
Cdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution traitCdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution traitMarco Antoniotti
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1Double Check ĆŐNSULTING
 
Journal of Computer Science Research | Vol.5, Iss.2 January 2023
Journal of Computer Science Research | Vol.5, Iss.2 January 2023Journal of Computer Science Research | Vol.5, Iss.2 January 2023
Journal of Computer Science Research | Vol.5, Iss.2 January 2023Bilingual Publishing Group
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataMark Heckmann
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)CSCJournals
 
As pi re2015_abstracts
As pi re2015_abstractsAs pi re2015_abstracts
As pi re2015_abstractsJoseph Park
 
The Latest on Boids Research - October 2014
The Latest on Boids Research - October 2014The Latest on Boids Research - October 2014
The Latest on Boids Research - October 2014Olaf Witkowski
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ijcseit
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...ijcseit
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn GraphAshwani kumar
 
davidldummerThesis2003_20070708secure
davidldummerThesis2003_20070708securedavidldummerThesis2003_20070708secure
davidldummerThesis2003_20070708secureDavid Dummer MS, PE
 
Structure Modeling of Disordered Protein Interactions
Structure Modeling of Disordered Protein InteractionsStructure Modeling of Disordered Protein Interactions
Structure Modeling of Disordered Protein InteractionsPurdue University
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsCSCJournals
 
modularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptmodularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptSanjarMadraximov
 
Phylogenetics Questions Answers
Phylogenetics Questions Answers Phylogenetics Questions Answers
Phylogenetics Questions Answers Zohaib HUSSAIN
 

Similar a Studying the 3D structure of the genome: the case study of P. falciparum (20)

Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
 
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
 
Cdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution traitCdac 2018 antoniotti cancer evolution trait
Cdac 2018 antoniotti cancer evolution trait
 
American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1American Statistical Association October 23 2009 Presentation Part 1
American Statistical Association October 23 2009 Presentation Part 1
 
Journal of Computer Science Research | Vol.5, Iss.2 January 2023
Journal of Computer Science Research | Vol.5, Iss.2 January 2023Journal of Computer Science Research | Vol.5, Iss.2 January 2023
Journal of Computer Science Research | Vol.5, Iss.2 January 2023
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid data
 
International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)International Journal of Computer Science and Security Volume (2) Issue (5)
International Journal of Computer Science and Security Volume (2) Issue (5)
 
As pi re2015_abstracts
As pi re2015_abstractsAs pi re2015_abstracts
As pi re2015_abstracts
 
The Latest on Boids Research - October 2014
The Latest on Boids Research - October 2014The Latest on Boids Research - October 2014
The Latest on Boids Research - October 2014
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES
 
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
A COMPARATIVE ANALYSIS OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT APPROACHES ...
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
 
davidldummerThesis2003_20070708secure
davidldummerThesis2003_20070708securedavidldummerThesis2003_20070708secure
davidldummerThesis2003_20070708secure
 
Structure Modeling of Disordered Protein Interactions
Structure Modeling of Disordered Protein InteractionsStructure Modeling of Disordered Protein Interactions
Structure Modeling of Disordered Protein Interactions
 
MultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic TwinsMultiModal Identification System in Monozygotic Twins
MultiModal Identification System in Monozygotic Twins
 
modularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptmodularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.ppt
 
Study on atome probe
Study on atome probe Study on atome probe
Study on atome probe
 
Mb viruses
Mb virusesMb viruses
Mb viruses
 
Phylogenetics Questions Answers
Phylogenetics Questions Answers Phylogenetics Questions Answers
Phylogenetics Questions Answers
 

Último

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Último (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

Studying the 3D structure of the genome: the case study of P. falciparum

  • 1. Studying the 3D structure of the genome The case study of P. falciparum Nelle Varoquaux 0 / 51
  • 2. The 3D structure of the genome is thought to play an important role in many biological processes The genome of S. cerevisiae is highly organized [Zimmer and Fabre, 2011] 1 / 51
  • 3. P. falciparum’s complex life cycle. Trophozoite Ring Schizont Gametocyte 1. Mosquito injects sporozoites 2. Sporozoites infect liver cells 3. Merozoites are release in blood stream 4. Erythrocytic cycle Sporozoite 8. Sporozoites develop. 5. Sexual cycle 6. Transmission to mosquito 7. Oocytes are formed Merozoite Oocyte The lifecycle P. falciparum 2 / 51
  • 4. The Hi-C protocol identifies physical contacts between pairs of loci genome-wide Hi-C paves the way for a systematic and genome-wide analysis of genome architecture [Rao et al., 2014] 3 / 51
  • 5. The contact count matrix recapitulates the hallmarks of genome architecture I II III IV V I II III IV V centromereschromosomes boundaries high low Contact counts for the first 5 chromosomes of S. cerevisiae 4 / 51
  • 6. HiC-Pro: from sequences to contact maps Reads Alignement Reads Alignement Reads Pairing Assignment to Restriction Fragment Combine Multiple Samples Raw Contact Maps Normalized Contact Maps Hi-C Fragment PE Sequencing Singleton MultiHits Low MAPQ Dangling End Valid Pairs Self Circle Unmapped Dumped Pairs Parallel Computing Phasing Data Assignment to Parental Genome Select Allele-specific Pairs 5 / 51
  • 7. A wide range of applications for Hi-C data Studying the 3D structure of the genome: 3D structure inference, data integration with gene expression profiles, replication timing, . . . 6 / 51
  • 8. A wide range of applications for Hi-C data Studying the 3D structure of the genome: 3D structure inference, data integration with gene expression profiles, replication timing, . . . Studying the 1D structure of the genome: genome assembly, metagenomic deconvolution, haplotype resolution, . . . 6 / 51
  • 9. A wide range of applications for Hi-C data Studying the 3D structure of the genome: 3D structure inference, data integration with gene expression profiles, replication timing, . . . Studying the 1D structure of the genome: genome assembly, metagenomic deconvolution, haplotype resolution, . . . Genome annotation: identifying rDNA clusters, centromeres, specific gene families, . . . I II III IV V I II III IV V centromereschromosomes boundaries high low 6 / 51
  • 10. How to create 3D models from contact maps and how to use them • Data-driven methods Consensus methods Ensemble methods Single-cell Hi-C based methods • Model-driven methods • Downstream analysis using 3D models: a highlight of the study of P. falciparum’s 3D structure 7 / 51
  • 11. Data-driven methods to build 3D models from contact maps. 8 / 51
  • 12. Inferring 3D models of genome architecture from Hi-C data Motivation • Little is known on the 3D structure of the genome. • Hi-C now allows to assess physical interactions genome-wide. Goal Propose a robust and accurate method for inferring 3D models of genome architecture from contact counts. Idea Model contact counts as Poisson random variables where the 3D model is a latent variable and cast the inference as maximizing the likelihood. 9 / 51
  • 13. Reconstructing the 3D structure from contact counts maps Consensus approach Contact counts Structure Inference [Duan et al., 2010, Tanizawa et al., 2010, Bau et al., 2011, Zhang et al., 2013, Ben-Elazar et al., 2013, Lesne et al., 2014] Ensemble approach Contact counts Population of Structures Inference [Rousseau et al., 2011, Hu et al., 2013, Kalhor et al., 2011, Tjong et al., 2012] 10 / 51
  • 14. Chromosomes as a series of beads • Let X ∈ Rn×3 be the coordinates of each bead. • Let C ∈ Rn×n be the raw contact count matrix. • Let Θ the count-to-distance transfer function. Optimization problem minimize x1,...,xn σ(X, C) 11 / 51
  • 15. Data-driven methods to build 3D models from contact maps. Consensus methods 12 / 51
  • 16. Metric MDS-based methods Formulation minimize x1,...,xn σ(X, C) = i,j|cij 0 wij( xi − xj 2 − Θ(cij))2 s.t. some constraints • X : 3D coordinates • C : normalized contact counts. • wij are weights (set to 1 Θ(cN ij )2 in pastis-MDS2) • Θ(c) = βcα: count-to-distance function 13 / 51
  • 17. Relationships between contact counts c, genomic distances s and Euclidean distances d Fractal globule • c ∼ s−1 • d ∼ s1/3 Equilibrium • c ∼ s−3/2 • d ∼ s1/2 for s < s2/3 max Relationship between contact counts and Euclidean distances dij = γc−1/3 ij , 14 / 51
  • 18. What have other people done? Paper Organism Full or Counts-to-distance Constraints wijpartial mapping Adj. rDNA C T Dekker et al. [2002] S. cerevisiae Chr III Worm-like chain behavior Duan et al. [2010] S. cerevisiae Whole-genome Linear Tanizawa et al. [2010] S. pombe Whole-genome FISH-derived Ay et al. [2014] P. falciparum Whole-genome Fractal globule derived δ2 ij Peng et al. [2013] P. falciparum Whole-genome Fractal globule derived δ2 ij Lesne et al. [2014] Whole-genome ad hoc Varoquaux et al. [2014] Whole-genome polymer-behaviour δ2 ij Differences between MDS-based methods 15 / 51
  • 19. Non-metric MDS-based method Contact counts Structure θ Distances Contact counts Infer Transfer function The count-to-distance transfer function relies on strong assumption about the physics of DNA. NMDS infers the transfer function jointly with the structure. 16 / 51
  • 20. Nonmetric MDS-based method (parametric) Hypothesis The mapping function can be parametrized: Θ(c) = βd−α (1) Optimization 1. Set α and β, optimize the structure X. 2. Optimize in α and β with respect to a certain criterion [Zhang et al., 2013] 17 / 51
  • 21. Nonmetric MDS-based method (nonparametric) Hypothesis If two loci i and j are observed to be in contact more often than loci k and , then i and j should be closer in 3D space than k and cij ≥ ck ⇔ xi − xj 2 ≤ xk − x 2 (2) Objective function minimize x1,...,xn,Θ σ(X, C, Θ)) subject to Θ decreasing [Varoquaux et al., 2014, Ben-Elazar et al., 2013] 18 / 51
  • 22. Statistical approaches for inferring the 3D structure of the genome • MDS-based methods minimize an arbitrary stress function that measures the discrepancy between wish distances and 3D distances of the model. Statistical approach for stable inference of genome structure • replace the arbitrary MDS loss function with a better-motivated likelihood function • define a probabilistic model of contact counts parametrized by the 3D model. 19 / 51
  • 23. A Poisson model for inferring the 3D structure of the genome The idea Let’s assume that cij ∼ Poisson(βdα ij ), where cij is the interaction count, dij the pairwise Euclidean distance, and β and α unknown parameters. Likelihood (X, α, β) = ij≤n (βdα ij )cij cij! exp(−βdα j ) Optimization problem minimize x1,...,xn,α,β σ(X, C, α, β) = −log( (X, C, α, β)) [Varoquaux et al., 2014] 20 / 51
  • 24. Modeling overdispersion using Negative Binomial The idea Let’s assume that c ∼ NegativeBinomial(βdα, r), where c is the interaction count, d the pairwise euclidean distance, r the dispersion parameter, α unknown parameters, and β a scale coefficient. Likelihood (X, C) = i,j Γ(cij + r) Γ(cij + 1)Γ(r) ( βdα ij r + βdα ij )cij (1 − βdα ij r + dα ij )r (3) The optimization problem max α,β,X L(X, α, β) = ij≤n cijα log dij − (cij + r) log(r + βdα ij ) (4) 21 / 51
  • 25. Data-driven methods to build 3D models from contact maps. Ensemble methods as a mean to infer population of structures 22 / 51
  • 26. Sampling local minima of ill-defined optimization problems • Model chromosomes as a series of beads, linked by restraining oscillators; • Add strengths to the oscillators: • Adjacent beads are neither too close on too far from one another; • Strengths derived from contact counts; • Run the optimization 50000 times. Umbarger et al. [2011], Bau et al. [2011], Kalhor et al. [2011] 23 / 51
  • 27. Estimating the posterior distribution of a statistical model • Define the statistical model: cij ∼ N(βdα ij , σ2 ); • Add priors (usually, non-informative priors); • Sample using MCMC. 24 / 51
  • 28. Conclusion • Wide variety of methods; • How to compare and evaluate? • Using simulated data; • By assessing stability and robustness with respect to: • data bootstrapping or subsampling; • contact map resolution; • between biological replicates; • By comparing to other sources of data. 25 / 51
  • 29. The art of modeling the genome 26 / 51
  • 30. The art of modeling the genome Goal Consider 10000 of randomly placed beads. Can you find the smallest set of constraints such that these beads interact overall the same way as a given contact map? Challenge While these models provide mechanistical insights, these are difficult to build: each organism, cell-type, and time point require hand crafted sets of constraints 27 / 51
  • 31. Building a yeast nucleus Modeling Chromosomes are modeled as flexable, volume-excluded polymers. Constraints (i) the chromosomes are constrained into a spherical ball representing the nucleus; (ii) centromeres are constrained into a spherical ball tethered to the nuclear membrane; (iii) telomeres are tethered to the nuclear membrane; (iv) rDNA is constrained into the nucleolus, represented as a spherical ball opposite to the centromeres. [Tjong et al., 2012, Tokuda et al., 2012] 28 / 51
  • 32. Building a yeast nucleus 29 / 51
  • 33. Building a P. falciparum nucleus? Question Could we similarly build a P. falciparum nucleus? Chr 7 Chr7 A VE 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Observed/Expected Chr 7 Chr7 B HiC 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 Observed/Expected 30 / 51
  • 34. Building a P. falciparum nucleus? Question Could we similarly build a P. falciparum nucleus? And if we add constraints on var genes. . . The optimization never converges to a structure that fully matches the wished constraints. 30 / 51
  • 35. Conclusion • Model-driven approaches provide mechanistical insights on the folding properties of genome. . . • But each organism, cell-line and possibly time-points require their own set of handcrafted constraints. 31 / 51
  • 36. How can we use 3D models to gain biological insights? The case study of P. falciparum 32 / 51
  • 37. The life cycle of P. falciparum Trophozoite Ring Schizont Gametocyte 1. Mosquito injects sporozoites 2. Sporozoites infect liver cells 3. Merozoites are release in blood stream 4. Erythrocytic cycle Sporozoite 8. Sporozoites develop. 5. Sexual cycle 6. Transmission to mosquito 7. Oocytes are formed Merozoite Oocyte The lifecycle P. falciparum 33 / 51
  • 38. A wide range of data sets available Name Strain Stage Resolution Number Perc Perc Reference of contacts of cis of trans Ay-rings 3D7 Late Rings 10 kb 16711552 43% 57% Ay et al. [2014] Ay-trophozoites 3D7 Trophozoites 10 kb 56348498 53% 47% Ay et al. [2014] Ay-schizonts 3D7 Schizonts 10 kb 11652832 55% 45 % Ay et al. [2014] Lemieux-A4+ IT/BC6+ Rings 25 kb 18488252 19% 81% Lemieux et al. [2013] Lemieux-A4 IT/3G8 Rings 25 kb 19674672 28% 72% Lemieux et al. [2013] Lemieux-A44 IT/BC6- Rings 25 kb 18660594 25% 75% Lemieux et al. [2013] Lemieux-DCJ On NF54/DCJ on Rings 25 kb 3098370 26% 74% Lemieux et al. [2013] Lemieux-DCJ Off NF54/DCF Off Rings 25 kb 2533470 26% 73% Lemieux et al. [2013] Lemieux-B15C2 NF54/B15C2 Rings 25 kb 1022996 12% 88% Lemieux et al. [2013] Summary of the available P. falciparum Hi-C datasets 34 / 51
  • 39. We can infer structures at different timepoints. . . 35 / 51
  • 40. How can we use 3D models to gain biological insights? Visual inspection, structure stability, clustering and other variance analysis 36 / 51
  • 41. Visual inspection, structure stability, clutsering and other variance analysis. . . • Are the resulting 3D models closer across timepoints than across initialization? 37 / 51
  • 42. Visual inspection, structure stability, clutsering and other variance analysis. . . • Are the resulting 3D models closer across timepoints than across initialization? • Are the model locally consistent? 37 / 51
  • 43. Visual inspection, structure stability, clutsering and other variance analysis. . . • Are the resulting 3D models closer across timepoints than across initialization? • Are the model locally consistent? • How do the structures differ? 37 / 51
  • 44. Visual inspection, structure stability, clutsering and other variance analysis. . . • Are the resulting 3D models closer across timepoints than across initialization? • Are the model locally consistent? • How do the structures differ? Are the hallmarks of interest conserved? 37 / 51
  • 45. Are the resulting 3D models closer across timepoints than across initialization? 0.50 0.25 0.00 0.25 0.50 0.75 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Firstcomponent Second component 0.50 0.25 0.00 0.25 0.50 0.75 0.75 0.50 0.25 0.00 0.25 0.50 0.75 1.00 Third component Ay-Rin Ay-Sch Ay-Trop Lemieux-Rin PCA analysis on features extract from the 3D structures of each timepoints. 38 / 51
  • 46. Are models locally consistent? Are model locally consistent? • Divide structures into overlapping sub-structures (5 to 20 beads) • Compute the pairwise RMSD between segments of all structures. • Segments overlapping across many structures can be labeled “consistent” 39 / 51
  • 47. Are the hallmarks of the genome architecture conserved? • Identify the hallmarks of interest: loci that are thought to colocalize from a set S • Perform 3D gene set enrichment • Define a distance threshold to seperate pairs as “close” or “far” • Compute a test-statistic: the number of genes of the set that are labeled as “close”. • Compare the test statistic with a null distribution estimated by resampling with preservation of chromosome structure. Conclusion In P. falciparum, centromeres, telomeres and var genes colocalize. 40 / 51
  • 48. How can we use 3D models to gain biological insights? Integrative analysis of gene expression and 3D structure 41 / 51
  • 49. Identifying links between gene expression profiles and 3D structure Motivation: Extract a gene expression profile v ∈ Rp that is: • representative of the gene expression profiles ; • correlated with the 3D structure; Data: For each gene g ∈ G • Log expression profiles at 27 datapoints: e(g) = (e1(g), . . . , ep(g)) ∈ Rp . • Gene’s 3D coordinates, extracted from the inferred 3D structure: x(g). Method: KernelCCA [Vert and Kanehisa, 2003, Bach and Jordan, 2002] 42 / 51
  • 50. Extracting a vector v representative of the gene expression profiles Find v ∈ Rp to maximize: V(v) = g∈G vT e(g) 2 v 2 43 / 51
  • 51. Find f such that f is smooth with respect to the 3D structure Let f be a vector of scores assigned to each genes. S(f) = f K−1 3D f ||f||2 We want: • V(v) be large, • S(f) be small, • (v e(g))g∈G and f be as correlated as possible This can be cast as a generalized eigenvalue problem 44 / 51
  • 52. KernelCCA reveals a strong correlation between gene expression profiles and 3D structure Gametocyte Sporozoite A B 2.0 -2.0 ** * ** * ** * * * * * * ** * * * * * * ** * ** ** * var rif Exported Gametocyte-specific Invasion −1 0 1 2 3 Geneexpression1 Geneexpression2 Structure1 Structure2 Geneexpression1 Geneexpression2 Structure1 Structure2 Geneexpression1 Geneexpression2 Structure1 Structure2 Geneexpression1 Geneexpression2 Structure1 Structure2 Geneexpression1 Geneexpression2 Structure1 Structure2 Component Meanz-score Stage IDC Gam Spz 45 / 51
  • 53. Conclusion • A plethora of data-driven methods are available. • Only a few are publicly available and well validated. • How to choose which is still an open and active research questions. • Model driven methods are organism and timepoints specific. • But provide powerful mechanistical insights in the genome architecture • May be a means to replace high throughput FISH experiments at low cost as preliminary explorations. • Downstream analysis of P. falciparum provided important insights in the relationship between gene expression and 3D structure. • But meaningful comparison of models from different time points at genome-scale is still an open problem. 46 / 51
  • 54. Acknowledgments Mines ParisTech - CBIO Jean-Philippe Vert Thomas Walter UW - Noble lab William S. Noble Ferhat Ay Institut Curie - U900 Emmanuel Barillot Nicolas Servant Chong-Jian Chen Eric Viara UMass Job Dekker Bryan Lajoie UC Riverside Karine Le Roch Sebastiaan Bol Evelien Bunnik Jacques Prudhomme Institut Curie - UMR168 Edith Heard UW - Dunham lab Maitreya Dunham Ivan Liachko UW - Shendure lab Jay Shendure Josh Burton 47 / 51
  • 55. References I F. Ay, E. M. Bunnik, N. Varoquaux, S. M. Bol, J. Prudhomme, J.-P. Vert, W. S. Noble, and K. G. Le Roch. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression. Genome Research, 24:974–988, 2014. F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002. D. Bau, A. Sanyal, B. R. Lajoie, E. Capriotti, M. Byron, J. B. Lawrence, J. Dekker, and M. A. Marti-Renom. The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules. Nat Struct Mol Biol, 18(1):107–114, 2011. S. Ben-Elazar, Z. Yakhini, and I. Yanai. Spatial localization of co-regulated genes exceeds genomic gene clustering in the saccharomyces cerevisiae genome. Nucleic Acids Res, 41(4):2191–2201, Feb 2013. 47 / 51
  • 56. References II J. Dekker, K. Rippe, M. Dekker, and N. Kleckner. Capturing chromosome conformation. Science, 295(5558):1306–1311, 2002. Z. Duan, M. Andronescu, K. Schutz, S. McIlwain, Y. J. Kim, C. Lee, J. Shendure, S. Fields, C. A. Blau, and W. S. Noble. A three-dimensional model of the yeast genome. Nature, 465:363–367, 2010. M. Hu, K. Deng, Z. Qin, J. Dixon, S. Selvaraj, J. Fang, B. Ren, and J. S. Liu. Bayesian inference of spatial organizations of chromosomes. PLoS Comput Biol, 9(1):e1002893, 2013. R. Kalhor, H. Tjong, N. Jayathilaka, F. Alber, and L. Chen. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat Biotechnol, 30(1):90–98, 2011. J. E. Lemieux, S. A. Kyes, T. D. Otto, A. I. Feller, R. T. Eastman, R. A. Pinches, M. Berriman, X. Z. Su, and C. I. Newbold. Genome-wide profiling of chromosome interactions in Plasmodium falciparum characterizes nuclear architecture and reconfigurations associated with antigenic variation. Mol Microbiol, 90(3):519–537, 2013. 48 / 51
  • 57. References III A. Lesne, J. Riposo, P. Roger, A. Cournac, and J. Mozziconacci. 3D genome reconstruction from chromosomal contacts. Nature Methods, 11(11):1141–1143, 2014. Cheng Peng, Liang-Yu Fu, Peng-Fei Dong, Zhi-Luo Deng, Jian-Xin Li, Xiao-Tao Wang, and Hong-Yu Zhang. The sequencing bias relaxed characteristics of Hi-C derived data and implications for chromatin 3D modeling. Nucleic Acids Res., 41(19):e183, 2013. S. S. P. Rao, M. H. Huntley, N. Durand, C. Neva, E. K. Stamenova, I. D. Bochkov, J. T. Robinson, A. L. Sanborn, I. Machol, A. D. Omer, E. S. Lander, and E. L. Aiden. A 3D map of the human genome at kilobase resolution reveals principles of chromatin v looping. Cell, 59(7): 1665–1680, 2014. M. Rousseau, J. Fraser, M. Ferraiuolo, J. Dostie, and M. Blanchette. Three-dimensional modeling of chromatin structure from interaction frequency data using Markov chain Monte Carlo sampling. BMC Bioinformatics, 12(1):414, October 2011. ISSN 1471-2105. 49 / 51
  • 58. References IV H. Tanizawa, O. Iwasaki, A. tanaka, J. R. Capizzi, P. Wickramasignhe, M. Lee, Z. Fu, and K. Noma. Mapping of long-range associations throughout the fission yeast genome reveals global genome organization linked to transcriptional regulation. Nucleic Acids Res, 38 (22):8164–8177, 2010. H. Tjong, K. Gong, L. Chen, and F. Alber. Physical tethering and volume exclusion determine higher-order genome organization in budding yeast. Genome Res, 22(7):1295–1305, 2012. Naoko Tokuda, Tomoki P Terada, and Masaki Sasai. Dynamical modeling of three-dimensional genome organization in interphase budding yeast. Biophys. J., 102(2):296–304, 2012. M. A. Umbarger, E. Toro, M. A. Wright, G. J. Porreca, D. Bau, S. Hong, M. J. Fero, L. J. Zhu, M. A. Marti-Renom, H. H. McAdams, L. Shapiro, J. Dekker, and G. M. Church. The three-dimensional architecture of a bacterial genome and its alteration by genetic perturbation. Molecular Cell, 44:252–264, 2011. 50 / 51
  • 59. References V N. Varoquaux, F. Ay, W. S. Noble, and J.-P. Vert. A statistical approach for inferring the 3D structure of the genome. Bioinformatics, 30(12):i26–i33, 2014. J.-P. Vert and M. Kanehisa. Graph-driven features extraction from microarray data using diffusion kernels and kernel CCA. In Advances in Neural Information Processing Systems 15, pages 1425–1432, Cambridge, MA, 2003. MIT Press. Z. Zhang, G. Li, K.-C. Toh, and W.-K. Sung. Inference of spatial organizations of chromosomes using semi-definite embedding approach and Hi-C data. In Proceedings of the 17th International Conference on Research in Computational Molecular Biology, volume 7821 of Lecture Notes in Computer Science, pages 317–332, Berlin, Heidelberg, 2013. Springer-Verlag. C. Zimmer and E. Fabre. Principles of chromosomal organization: lessons from yeast. Journal of Cell Biology, 192(5):723–733, 2011. 51 / 51