This document summarizes research on modeling the resting brain using multi-subject models. It discusses using spatial independent component analysis (ICA) to decompose brain activity into spatial maps that are consistent across subjects. It also discusses estimating functional connectivity networks by imposing sparsity on inverse covariance matrices estimated across subjects. Multi-subject dictionary learning approaches that estimate shared spatial patterns across subjects while modeling subject variability are presented. These approaches aim to overcome challenges from small sample sizes by leveraging information across subjects.
2. Rest, a window on intrinsic structures
Anti-correlated functional networks
(segregation)
Small-world, highly-connected, graphs
(integration)
Small-sample biases?
Few spatial modes
Spurious correlations
Ga¨l Varoquaux
e 2
3. Challenges to modeling the resting brain
Model selection
Small-sample estimation
Mitigating data scarcity
Generative multi-subject models
Machine-learning/high-dimensional statistics
Ga¨l Varoquaux
e 3
7. 1 Decomposing in spatial modes: a model
voxels voxels
voxels
Y E · S + N
time
time
time
=
25
Decomposing time series into:
covarying spatial maps, S
uncorrelated residuals, N
ICA: minimize mutual information across S
Ga¨l Varoquaux
e 6
8. 1 ICA on multiple subjects: group ICA
Estimate common spatial maps S:
voxels voxels
voxels
Y
1
E
1
· S + N
1
time
time
time
=
·
· ·
· ·
·
s s s
Y E · S + N
time
time
time
=
Ga¨l Varoquaux
e [Calhoun HBM 2001] 7
9. 1 ICA on multiple subjects: group ICA
Estimate common spatial maps S:
voxels voxels
voxels
Y
1
E
1
· S + N
1
time
time
time
=
·
· ·
· ·
·
s s s
Y E · S + N
time
time
time
=
Concatenate images, minimize norm of residuals
Corresponds to fixed-effects modeling:
i.i.d. residuals Ns
Ga¨l Varoquaux
e [Calhoun HBM 2001] 7
10. 1 ICA: Noise model
Observation noise: minimize group residuals (PCA):
voxels voxels
voxels
Y W · B + O
time
time
time
concat =
Learn interesting maps (ICA):
voxels voxels
·
sources
sources
B = M S
Ga¨l Varoquaux
e 8
11. 1 CanICA: random effects model
Observation noise: minimize subject residuals (PCA):
voxels voxels
Subject
voxels
Y W · P + Os
time
time
time
s = s s
Select signal similar across subjects (CCA):
voxels
P1
Group
voxels
·
subjects
sources
.
.
. = Λ· B + R
Ps
Learn interesting maps (ICA):
voxels voxels
·
sources
sources
B = M S
Ga¨l Varoquaux
e [Varoquaux NeuroImage 2010] 9
12. 1 ICA: model selection
Metric: reproducibility across controls groups
no CCA CanICA MELODIC
.36 (.02) .72 (.05) .51 (.04)
Quantifies usefulness
But not goodness of fit
Cannot select number of maps
Ga¨l Varoquaux
e [Varoquaux NeuroImage 2010] 10
13. 1 CanICA: qualitative observations
Structured components
ICA extracts a brain parcellation
Does not select for what we interpret
No overall control of residuals
Lack of model-selection metric
Ga¨l Varoquaux
e 11
14. 1 ICA as dictionary learning
voxels voxels
voxels
Y E · S + N
time
time
time
=
25
Degenerate model: need prior
ICA is an improper prior
⇒ Noise N must be estimated separately
Impose sparsity, rather than independence
Ga¨l Varoquaux
e 12
15. 1 Sparse structured dictionary learning
Spatial
Time series maps
Model of observed data:
Y = UVT + E, E ∼ N (0, σI)
Sparsity prior:
V ∼ exp (−ξ Ω(V)), Ω(v) = v 1
Structured sparsity
Ga¨l Varoquaux
e [Jenatton, in preparation] 13
16. 1 Sparse structured dictionary learning
Cross-validated likelihood
SSPCA
SPCA
ICA
50 100 150 200
Number of maps
Can learn many regions
Ga¨l Varoquaux
e [Varoquaux, NIPS workshop 2010] 14
18. 1 Multi-subject dictionary learning
Subject Group
Time series maps maps
25 x
Subject level spatial patterns:
Ys = Us Vs T + Es , Es ∼ N (0, σI)
Group level spatial patterns:
Vs = V + Fs , Fs ∼ N (0, ζI)
Sparsity and spatial-smoothness prior:
1
V ∼ exp (−ξ Ω(V)), Ω(v) = v 1 + vT Lv
2
Ga¨l Varoquaux
e [Varoquaux IPMI 2011] 16
19. 1 Multi-subject dictionary learning
Estimation: maximum a posteriori
argmin Ys − Us Vs T 2
Fro + µ Vs − V 2
Fro + λ Ω(V)
Us ,Vs ,V sujets
Data fit Subject Penalization: sparse
variability and smooth maps
Parameter selection
µ: comparing variance (PCA spectrum) at subject
and group level
λ: cross-validation
Ga¨l Varoquaux
e [Varoquaux IPMI 2011] 17
20. 1 Multi-subject dictionary learning
Individual maps + Atlas of functional regions
Ga¨l Varoquaux
e [Varoquaux IPMI 2011] 18
21. 1 Multi Subject dictionary learning
ICA
MSDL
Brain parcellations
Ga¨l Varoquaux
e 19
22. Spatial modes: from fluctuations to a parcellation
voxels voxels
voxels
Y E · S + N
time
time
time
=
Ga¨l Varoquaux
e 20
23. Associated time series:
voxels voxels
voxels
Y E · S + N
time
time
time
=
Ga¨l Varoquaux
e 20
25. 2 Inferring a brain wiring diagram
Small-world connectivity:
sparse graph with efficient transport
integration
Isolate functional structures:
segregation/specialization
Ga¨l Varoquaux
e 22
26. 2 Independence graphs from correlation matrices
For a given correlation matrix:
1 T −1
Multivariate normal P(X) ∝ |Σ−1 |e − 2 X Σ X
Parametrized by inverse covariance matrix K = Σ−1
Covariance matrix: Inverse covariance:
Direct and Partial correlations
indirect effects ⇒ Independence graph
1 1
2 2
0 0
3 3
4 4
Ga¨l Varoquaux
e [Varoquaux NIPS 2010, Smith 2011] 23
28. 2 Sparse inverse covariance estimation: penalized
Maximum a posteriori:
Fit models with a prior
ˆ
K = argmax L(Σ|K) + f (K)
K 0
Sparse Prior ⇒ Lasso-like problem: 1 penalization
Ga¨l Varoquaux
e [Varoquaux NIPS 2010] [Smith 2011] 25
29. 2 Sparse inverse covariance estimation: penalized
Maximum a posteriori:
Fit models with a prior
ˆ
K = argmax L(Σ|K) + f (K)
K 0
Sparse Prior ⇒ Lasso-like problem: 1 penalization
Test-data likelihood
Optimal graph
almost dense Sparsity
2.5 3.0 3.5 4.0
−log10λ
Ga¨l Varoquaux
e [Varoquaux NIPS 2010] [Smith 2011] 25
30. 2 Sparse inverse covariance estimation: greedy
Greedy algorithm: PC-DAG
1. PC-alg: prune graph by independence tests
conditioning on neighbors
2. Learn covariance on resulting structure
Ga¨l Varoquaux
e [Varoquaux J. Physio Paris, accepted] 26
31. 2 Sparse inverse covariance estimation: greedy
Greedy algorithm: PC-DAG
1. PC-alg: prune graph by independence tests
conditioning on neighbors
2. Learn covariance on resulting structure
Test data likelihood
High-degree nodes
prevent proper
estimation
Lattice-like structure 0 20
Fillingfactor
with hubs (percents)
Ga¨l Varoquaux
e [Varoquaux J. Physio Paris, accepted] 26
32. 2 Decomposable covariance estimation
Decomposable models: S1
C1
Cliques of nodes, S2
independent conditionally
on intersections C2
Greedy algorithm for estimation C3
Ga¨l Varoquaux
e [Varoquaux J. Physio Paris, accepted] 27
33. 2 Decomposable covariance estimation
Decomposable models: S1
C1
Cliques of nodes, S2
independent conditionally
on intersections C2
Greedy algorithm for estimation C3
Test data likelihood
20 30 40 50 60 70 80 90
Max clique (percents)
Ga¨l Varoquaux
e [Varoquaux J. Physio Paris, accepted] 27
34. 2 Decomposable covariance estimation
Decomposable models: S1
C1
Cliques of nodes, S2
independent conditionally
on intersections not very sparse
1 -penalized C2
PC-DAG limited by high-degree nodes
C
Greedy algorithmdecomposable in small systems 3
Models not for estimation
Modular, small world graphs
Test data likelihood
20 30 40 50 60 70 80 90
Max clique (percents)
Ga¨l Varoquaux
e [Varoquaux J. Physio Paris, accepted] 27
35. 2 Multi-subject sparse inverse covariance estimation
Accumulate samples for better structure estimation
Maximum a posteriori:
ˆ
K = argmax L(Σ|K) + f (K)
K 0
New prior: Population prior:
same independence structure across subjects
ˆ
⇒ Estimate together all {Ks } from {Σs }
Group-lasso (mixed norms):
21 penalization f {Ks } = λ (Ks )2
i,j
i=j s
Ga¨l Varoquaux
e [Varoquaux NIPS 2010] 28
36. 2 Population-sparse graph perform better
Population
ˆ
Σ−1
Sparse
inverse prior
Likelihood of new data (nested cross-validation)
Subject data, Σ−1 -57.1
Subject data, sparse inverse 43.0
Group average data, Σ−1 40.6
Group average data, sparse inverse 41.8
Population prior 45.6
Ga¨l Varoquaux
e [Varoquaux NIPS 2010] 29
37. 2 Small-world structure of brain graphs
Raw Population
correlations prior
Ga¨l Varoquaux
e [Varoquaux NIPS 2010] 30
38. 2 Small-world structure of brain graphs
Raw Population
correlations prior
Functional segregation structure:
Graph modularity =
divide in communities to
maximize intra-class connections
versus extra-class
Ga¨l Varoquaux
e 30
39. 2 Small-world structure of brain graphs
Raw Population
correlations prior
Ga¨l Varoquaux
e 30
40. Multi-subject models of the resting brain
From brain networks to brain parcellations
Good models learn many regions
Sparsity, structure and subject-variability
⇒ Population-level atlas
Y = E · S + N
25
Small-world brain networks
High-degrees and long cycles hard to estimate
Modular structure reflects functional systems
Small-sample estimation is challenging
Ga¨l Varoquaux
e 31
41. Thanks
B. Thirion, J.B. Poline, A. Kleinschmidt
Dictionary learning F. Bach, R. Jenatton
Sparse inverse covariance A. Gramfort
Software: in Python
scikit-learn: machine learning
F. Pedegrosa, O. Grisel, M. Blondel . . .
Mayavi: 3D plotting
P. Ramachandran
Ga¨l Varoquaux
e 32
42. Bibliography 1
[Varoquaux NeuroImage 2010] G. Varoquaux, S. Sadaghiani, P. Pinel, A.
Kleinschmidt, J.B. Poline, B. Thirion A group model for stable multi-subject ICA
on fMRI datasets, NeuroImage 51 p. 288 (2010)
http://hal.inria.fr/hal-00489507/en
[Varoquaux NIPS workshop 2010] G. Varoquaux, A. Gramfort, B. Thirion, R.
Jenatton, G. Obozinski, F. Bach, Sparse Structured Dictionary Learning for
Brain Resting-State Activity Modeling, NIPS workshop (2010)
https://sites.google.com/site/nips10sparsews/schedule/papers/
RodolpheJennatton.pdf
[Varoquaux IPMI 2011] G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel,
and B. Thirion, Multi-subject dictionary learning to segment an atlas of brain
spontaneous activity, Information Processing in Medical Imaging p. 562 (2011)
http://hal.inria.fr/inria-00588898/en
[Varoquaux NIPS 2010] G. Varoquaux, A. Gramfort, J.B. Poline and B. Thirion,
Brain covariance selection: better individual functional connectivity models using
population prior, NIPS (2010)
http://hal.inria.fr/inria-00512451/en
Ga¨l Varoquaux
e 33
43. Bibliography 2
[Smith 2011] S. Smith, K. Miller, G. Salimi-Khorshidi et al, Network modelling
methods for fMRI, Neuroimage 54 p. 875 (2011)
[Varoquaux J. Physio Paris, accepted] G. Varoquaux, A. Gramfort, J.B. Poline
and B. Thirion, Markov models for fMRI correlation structure: is brain functional
connectivity small world, or decomposable into networks?, J. Physio Paris,
(accepted)
[Ramachandran 2011] P. Ramachandran, G. Varoquaux Mayavi: 3D visualization
of scientific data, Computing in Science & Engineering 13 p. 40 (2011)
http://hal.inria.fr/inria-00528985/en
[Pedregosa 2011] F. Pedregosa, G. Varoquaux, A. Gramfort et al, Scikit-learn:
machine learning in Python, JMLR 12 p. 2825 (2011)
http://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html
Ga¨l Varoquaux
e 34