SlideShare una empresa de Scribd logo
1 de 93
Descargar para leer sin conexión
Introduction to Machine Learning in MR Imaging
Gaël Varoquaux
Introduction to Machine Learning in MR Imaging
Gaël Varoquaux
Outline:
1 The machine learning setting
2 A glance at a few models
3 Model evaluation
4 Learning on full-brain images
5 Learning on correlations in brain activity
Speaker Name: Gaël Varoquaux
I have no financial interests or relationships to disclose with regard to the
subject matter of this presentation.
Declaration of
Financial Interests or Relationships
1 The machine learning setting
Adjusting models for prediction
G Varoquaux 3
1 Machine learning in a nutshell: an example
Face recognition
Andrew Bill Charles Dave
G Varoquaux 4
1 Machine learning in a nutshell: an example
Face recognition
Andrew Bill Charles Dave
G Varoquaux 4
1 Machine learning in a nutshell
A simple method:
1 Store all the known (noisy) images and the names that go with
them.
2 From a new (noisy) images, find the image that is most similar.
“Nearest neighbor” method
G Varoquaux 5
1 Machine learning in a nutshell
A simple method:
1 Store all the known (noisy) images and the names that go with
them.
2 From a new (noisy) images, find the image that is most similar.
“Nearest neighbor” method
How many errors on already-known images?
... 0: no errors
Test data = Train data
G Varoquaux 5
1 Machine learning in a nutshell
A simple method:
1 Store all the known (noisy) images and the names that go with
them.
2 From a new (noisy) images, find the image that is most similar.
“Nearest neighbor” method
How many errors on already-known images?
... 0: no errors
Test data = Train data
G Varoquaux 5
1 Machine learning in a nutshell: regression
A single descriptor: 1 dimension
x
y
G Varoquaux 6
1 Machine learning in a nutshell: regression
A single descriptor: 1 dimension
x
y
x
y
Which model to prefer?
G Varoquaux 6
1 Machine learning in a nutshell: regression
A single descriptor: 1 dimension
x
y
x
y
Problem of “over-fitting”
Minimizing error is not always the best strategy (learning noise)
Test data = train data
G Varoquaux 6
1 Machine learning in a nutshell: regression
A single descriptor: 1 dimension
x
y
x
y
Prefer simple models = concept of “regularization”
Balance the number of parameters to learn with the amount of data
G Varoquaux 6
1 Machine learning in a nutshell: regression
A single descriptor: 1 dimension
x
y
x
y
Prefer simple models = concept of “regularization”
Balance the number of parameters to learn with the amount of data
Bias variance tradeoff
G Varoquaux 6
1 Machine learning in a nutshell: regression
A single descriptor: 1 dimension
x
y Two descriptors: 2 dimensions
X_1
X_2
y
More
parameters
G Varoquaux 6
1 Machine learning in a nutshell: regression
A single descriptor: 1 dimension
x
y Two descriptors: 2 dimensions
X_1
X_2
y
More
parameters
⇒ Model with more parameters need much more data
“curse of dimensionality”
G Varoquaux 6
1 Some formalism: bias and regularization
Settings: data (X, y), prediction y ∼ f (X, w)
Our goal: minimizew
E[ y − f (X, w) ]
We only can measure y − f (X, w)
G Varoquaux 7
1 Some formalism: bias and regularization
Settings: data (X, y), prediction y ∼ f (X, w)
Our goal: minimizew
E[ y − f (X, w) ]
We only can measure y − f (X, w)
Prediction is very difficult, especially about the future.
Niels Bohr
G Varoquaux 7
1 Some formalism: bias and regularization
Settings: data (X, y), prediction y ∼ f (X, w)
Our goal: minimizew
E[ y − f (X, w) ]
We only can measure y − f (X, w)
Solution: bias w to push toward a plausible solution
In a minimization framework: minimizew
y − f (X, w) + p(w)
G Varoquaux 7
1 Probabilistic modeling: Bayesian point of view
P(w|x, y) ∝ P(y, x|w) P(w) ( )
“Posterior”
Quantity of interest
Forward model “Prior”
Expectations on w
Forward model: y = f (x, w) + e, e: noise
⇒ P(x, y|w) ∝ exp −L(y − f (x, w))
Prior: P(w) ∝ exp −p(w)
Negated log of ( ): L(y − f (X, w)) + p(w)
Minimization framework = maximum a posteriori
Note that the Bayesian interpretation may be more subtle [Gribonval 2011].G Varoquaux 8
1 Summary: elements of a machine-learning method
A forward model: ypred = f (X, w)
Numerical rules to go from X to y
A loss, or data fit
A measure of error between ytrue and ypred
Can be given by a noise model
Regularization:
Any way of restricting model complexity
- by choices in the model
- via a penalty
G Varoquaux 9
2 A glance at a few models
Linear models
Random forests
Gradient boosted trees
G Varoquaux 10
2 Classification: categorical variables
y describes categories, say (0, 1)
x
y
Fit y by a straight line?
G Varoquaux 11
2 Classification: categorical variables
y describes categories, say (0, 1)
x
y
Fit y by a straight line?
A sigmoid is better suited
⇒ Logistic regression
G Varoquaux 11
2 Classification: categorical variables
y describes categories, say (0, 1)
x
y
Fit y by a straight line?
A sigmoid is better suited
⇒ Logistic regression
SVM (Support Vector Machines)
are similar
Note that points far from threshold don’t influence the fit
G Varoquaux 11
2 Classification on multivariate data
Choosing a separating hyperplane
X_1
X_2
G Varoquaux 12
2 Classification on multivariate data: SVMs
SVM: support vector machine
X_1
X_2
Discriminative method: choose hyperplane to maximize margin
G Varoquaux 12
2 Classification on multivariate data: SVMs
SVM: support vector machine
X_1
X_2
x
y
Discriminative method: choose hyperplane to maximize margin
For this, use exemplars close to hyperplane (support vectors)G Varoquaux 12
2 With brain images
Find a separating hyperplane
between images
G Varoquaux 13
2 With brain images
Find a separating hyperplane
between images
SVM builds it by combining
exemplars
⇒ hyperplane looks like
train images
G Varoquaux 13
2 Risk minimization and penalization
Inverse problem
Minimize the error term:
ˆw = argmin
w
l(y − X w)
Ill-posed:
Many different w will give
the same prediction error
To choose one: inject prior with a penalty
ˆw = argmin
w
l(y − X w) + p(w)
G Varoquaux 14
2 Sparse models: selecting predictive voxels?
Lasso estimator
minx
y − Ax 2
2 + λ 1(x)
Data fit Penalization
x2
x1
G Varoquaux 15
2 Sparse models: selecting predictive voxels?
Lasso estimator
minx
y − Ax 2
2 + λ 1(x)
Data fit Penalization
x2
x1
Between correlated features,
selects a random subset
[Wainwright 2009, Varoquaux... 2012]
Violates the restricted isometry property
lasso: 23 non-zerosG Varoquaux 15
2 Underfit versus overfit
Polynome degree 1
x
y
G Varoquaux 16
2 Underfit versus overfit
Polynome degree 1
x
y Polynome degree 9
x
yG Varoquaux 16
2 Underfit versus overfit
Polynome degree 1
x
y
Train explained variance: 0.56
Test explained variance: 0.56
Underfit
Polynome degree 9
x
y
Train explained variance: 0.99
Test explained variance: -98000
Overfit
G Varoquaux 16
2 Random forest
1 staircase
Fit with a staircase of many values (tree)
G Varoquaux 17
2 Random forest
1 staircase
Fit with a staircase of many values (tree)
Fit several on perturbed data
G Varoquaux 17
2 Random forest
1 staircase
Fit with a staircase of many values (tree)
Fit several on perturbed data
G Varoquaux 17
2 Random forest
1 staircase
Fit with a staircase of many values (tree)
Fit several on perturbed data
G Varoquaux 17
2 Random forest
1 staircase
Fit with a staircase of many values (tree)
Fit several on perturbed data
G Varoquaux 17
2 Random forest
1 staircase
10 staircases averaged
Fit with a staircase of many values (tree)
Fit several on perturbed data
Average prediction
G Varoquaux 17
2 Random forest
1 staircase
10 staircases averaged
50 staircases averaged
Fit with a staircase of many values (tree)
Fit several on perturbed data
Average prediction
G Varoquaux 17
2 Random forest
1 staircase
10 staircases averaged
50 staircases averaged
300 staircases averaged
Fit with a staircase of many values (tree)
Fit several on perturbed data
Average prediction
G Varoquaux 17
2 Random forest
1 staircase
10 staircases averaged
50 staircases averaged
300 staircases averaged
Fit with a staircase of many values (tree)
Fit several on perturbed data
Average prediction
Forest = ensemble of trees
Bagging = boostrap aggregating
Applied to models that overfit
G Varoquaux 17
2 Gradient boosted trees
1 staircase
Fit with a staircase of 10 constant values (tree)
G Varoquaux 18
2 Gradient boosted trees
1 staircase
2 staircases combined
Fit with a staircase of 10 constant values (tree)
Fit a new staircase on errors
G Varoquaux 18
2 Gradient boosted trees
1 staircase
2 staircases combined
3 staircases combined
Fit with a staircase of 10 constant values (tree)
Fit a new staircase on errors
Keep going
G Varoquaux 18
2 Gradient boosted trees
1 staircase
2 staircases combined
3 staircases combined
300 staircases combined
Fit with a staircase of 10 constant values (tree)
Fit a new staircase on errors
Keep going
G Varoquaux 18
2 Gradient boosted trees
1 staircase
2 staircases combined
3 staircases combined
300 staircases combined
Fit with a staircase of 10 constant values (tree)
Fit a new staircase on errors
Keep going
“Boosted” regression trees
Boosting = iteratively fit models on error
Applied to models that underfit
G Varoquaux 18
2 Gradient boosted trees
1 staircase
2 staircases combined
3 staircases combined
300 staircases combined
Fit with a staircase of 10 constant values (tree)
Fit a new staircase on errors
Keep going
“Boosted” regression trees
Boosting = iteratively fit models on error
Applied to models that underfit
Complexity trade offs
Computational + statistical
G Varoquaux 18
3 Model evaluation
Does it predict?
G Varoquaux 19
3 Generalization as a test: cross-validation
x
y
x
y
G Varoquaux 20
3 Generalization as a test: cross-validation
x
y
x
y
⇒ Need test on independent data, to control for model complexity
Train set Validation
set
Measures prediction accuracy [Varoquaux... 2017]
G Varoquaux 20
3 Generalization as a test: cross-validation
x
y
x
y
⇒ Need test on independent data,
Loop
Test setTrain set
Full data
G Varoquaux 20
3 Generalization as a test: cross-validation
x
y
x
y
⇒ Need test on independent data,
Nested loop
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 20
3 Confounds & dependencies
time
X1 Prediction is easy
if you look at the last time point
G Varoquaux 21
3 Confounds & dependencies
time
X1X2 Prediction is easy
if you look at the last time point
Auto-correlated noise
Neighboring data points
not independent
No evidence of generalization
No evidence of useful prediction
Validate on independent data
Not 2 images of the same subject
Unless it’s a longitudinal study
[Varoquaux... 2017, Little... 2017]
G Varoquaux 21
3 Uncertainty of cross-validation
[Varoquaux 2017]
100
200
er of available samples   
­19% +15%
­10% +8%
­10% +10%
­7% +5%
­7% +7%
­6% +6%
LOO
LOO
50 splits, 20% test
LOO
50 splits, 20% test
50 splits, 20% test50 splits, 20% test
Distribution of prediction accuracies across 50 folds
Should be seen as a posterior distribution (or bootstrap)
= folds are not independent measures
No statistical tests (eg T tests) on folds
G Varoquaux 22
3 Uncertainty of cross-validation
[Varoquaux 2017]
100
200
300
1000
Number of available samples   
­19% +15%
­10% +8%
­10% +10%
­7% +5%
­7% +7%
­5% +4%
­6% +6%
­3% +2%
­3% +3%
LOO
LOO
50 splits, 20% test
LOO
50 splits, 20% test
LOO
50 splits, 20% test
50 splits, 20% test50 splits, 20% test
G Varoquaux 22
3 Uncertainty of cross-validation
[Varoquaux 2017]
100
200
300
ber of available samples   
­19% +15%
­10% +8%
­10% +10%
­7% +5%
­7% +7%
­5% +4%
­6% +6%
LOO
LOO
50 splits, 20% test
LOO
50 splits, 20% test
50 splits, 20% test50 splits, 20% test
Sample size Confidence interval (binomial model)
100 18.0%
1 000 5.7%
10 000 1.8%
100 000 0.568%
Sampling noise – Decreases slowly with n
G Varoquaux 22
3 Uncertainty of cross-validation
[Varoquaux 2017]
­45% ­30% ­15%  0% +15% +30%
Difference between public and private scores       
­15% +14%
Kaggle competition on r-fMRI for Schizophrenia
2 different test sets: size 30 and 28
Winner of competition: first code written
G Varoquaux 22
4 Learning on full-brain images
Personal focus on functional imaging
G Varoquaux 23
4 Linear models on brain images
Design
matrix
× Coefficients =
Coefficients are
brain maps
Target
G Varoquaux 24
4 Finding the predictive regions?
Face vs house visual recognition [Haxby... 2001]
SVM
error: 26%
G Varoquaux 25
4 Finding the predictive regions?
Face vs house visual recognition [Haxby... 2001]
Sparse model
error: 19%
G Varoquaux 25
4 Finding the predictive regions?
Face vs house visual recognition [Haxby... 2001]
Ridge
error: 15%
G Varoquaux 25
4 Estimating the linear model [Gramfort... 2013]
Inverse problem
Minimize the error term:
ˆw = argmin
w
l(y − X w)
Ill-posed:
Many different w will give
the same prediction error
Choice driven by (implicit) priors
SVM sparse ridge TV- 1
G Varoquaux 26
4 Estimating the linear model [Gramfort... 2013]
Inverse problem
Minimize the error term:
ˆw = argmin
w
l(y − X w)
Ill-posed:
Many different w will give
the same prediction error
Choice driven by (implicit) priors
SVM sparse ridge TV- 1
Sparse estimators (Lasso)
minx
y − Ax 2
2 + λ 1(x)
Data fit Penalization
x2
x1G Varoquaux 26
4 Estimating the linear model [Gramfort... 2013]
Inverse problem
Minimize the error term:
ˆw = argmin
w
l(y − X w)
Ill-posed:
Many different w will give
the same prediction error
Choice driven by (implicit) priors
SVM sparse ridge TV- 1
Total-variation penalization
Impose sparsity on the gradient of the
image:
p(w) = 1( w)
In fMRI: [Michel... 2011]
G Varoquaux 26
4 Estimating the linear model [Gramfort... 2013]
Inverse problem
Minimize the error term:
ˆw = argmin
w
l(y − X w)
Ill-posed:
Many different w will give
the same prediction error
Choice driven by (implicit) priors
SVM sparse ridge TV- 1
G Varoquaux 26
4 Tricks of the trade
Spatial smoothing
12mm
Giving up on resolution
error rate: 43% 31%
Feature selection
23%
Giving up on multivariate
G Varoquaux 27
4 FREM: Ensemble models [Hoyos-Idrobo... 2017]
Very fast sub optimal models
Average many of them
G Varoquaux 28
4 FREM: Ensemble models [Hoyos-Idrobo... 2017]
Very fast sub optimal models
Average many of them
...
...
Learn parcellation on
perturbed data
Estimate linear models
Average the results
G Varoquaux 28
4 FREM: Ensemble models [Hoyos-Idrobo... 2017]
Very fast sub optimal models
Average many of them
G Varoquaux 28
4 FREM: Ensemble models [Hoyos-Idrobo... 2017]
Very fast sub optimal models
Average many of them
G Varoquaux 28
4 FREM: Ensemble models [Hoyos-Idrobo... 2017]
Very fast sub optimal models
Average many of them
G Varoquaux 28
5 Learning on correlations in brain
activity
“connectome” prediction
Brain “at rest”
= easier to scan
G Varoquaux 29
From rest-fMRI to biomarkers
No salient features in rest fMRI
G Varoquaux 30
From rest-fMRI to biomarkers
Define functional regions
G Varoquaux 30
From rest-fMRI to biomarkers
Define functional regions
Learn interactions
G Varoquaux 30
From rest-fMRI to biomarkers
Define functional regions
Learn interactions
Detect differences
G Varoquaux 30
From rest-fMRI to biomarkers
Functional
connectivity
matrix
Time series
extraction
Region
definition
Supervised learning
RS-fMRI
G Varoquaux 31
From rest-fMRI to biomarkers
Functional
connectivity
matrix
Time series
extraction
Region
definition
Supervised learning
RS-fMRI
Unsupervised learning ⇒ adapted representations dictionary learning
G Varoquaux 31
From rest-fMRI to biomarkers
Functional
connectivity
matrix
Time series
extraction
Region
definition
Supervised learning
RS-fMRI
1Tb data 50Gb data
Unsupervised learning ⇒ adapted representations dictionary learning
More data is always better computational cost [Mensch... 2016]G Varoquaux 31
5 Biomarkers, heterogeneity, & big data: Autism study
[Abraham... 2017]
Ill-defined pathology Heterogeneous dataset (ABIDE)
As number of subjects increases
Accuracy
Fraction of subjects used
Heterogeneity is not the limiting factor
G Varoquaux 32
@GaelVaroquaux
An introduction to Machine Learning for MR imaging
Overfit versus underfit
The best fit is not the best performer
Minimizing an observed error is the wrong solution
xy
@GaelVaroquaux
An introduction to Machine Learning for MR imaging
Overfit versus underfit
Evaluate on independent data
Cross-validation error bars
@GaelVaroquaux
An introduction to Machine Learning for MR imaging
Overfit versus underfit
Evaluate on independent data
More data trumps being clever
@GaelVaroquaux
An introduction to Machine Learning for MR imaging
Overfit versus underfit
Evaluate on independent data
More data trumps being clever
Software: scikit-learn & nilearn In Python
http://scikit-learn.org http://nilearn.github.io
ni
References I
A. Abraham, M. P. Milham, A. Di Martino, R. C. Craddock, D. Samaras, B. Thirion,
and G. Varoquaux. Deriving reproducible biomarkers from multi-site resting-state
data: An autism-based example. NeuroImage, 147:736–745, 2017.
A. Gramfort, B. Thirion, and G. Varoquaux. Identifying predictive regions from fMRI
with TV-L1 prior. In PRNI, page 17, 2013.
R. Gribonval. Should penalized least squares regression be interpreted as maximum a
posteriori estimation? IEEE Transactions on Signal Processing, 59(5):2405–2410,
2011.
J. V. Haxby, I. M. Gobbini, M. L. Furey, ... Distributed and overlapping representations
of faces and objects in ventral temporal cortex. Science, 293:2425, 2001.
A. Hoyos-Idrobo, G. Varoquaux, Y. Schwartz, and B. Thirion. Frem – scalable and
stable decoding with fast regularized ensemble of models. NeuroImage, 2017.
References II
M. A. Little, G. Varoquaux, S. Saeb, L. Lonini, A. Jayaraman, D. C. Mohr, and K. P.
Kording. Using and understanding cross-validation strategies. perspectives on saeb
et al. GigaScience, 6(5):1–6, 2017.
A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux. Dictionary learning for massive
matrix factorization. In International Conference on Machine Learning, pages
1737–1746, 2016.
V. Michel, A. Gramfort, G. Varoquaux, E. Eger, and B. Thirion. Total variation
regularization for fMRI-based prediction of behavior. Medical Imaging, IEEE
Transactions on, 30:1328, 2011.
S. M. Smith, T. E. Nichols, D. Vidaurre, A. M. Winkler, T. E. Behrens, M. F. Glasser,
K. Ugurbil, D. M. Barch, D. C. Van Essen, and K. L. Miller. A positive-negative
mode of population covariation links brain connectivity, demographics and behavior.
Nature neuroscience, 18(11):1565–1567, 2015.
G. Varoquaux. Cross-validation failure: small sample sizes lead to large error bars.
NeuroImage, 2017.
References III
G. Varoquaux and B. Thirion. How machine learning is shaping cognitive
neuroimaging. GigaScience, 3:28, 2014.
G. Varoquaux, A. Gramfort, and B. Thirion. Small-sample brain mapping: sparse
recovery on spatially correlated designs with randomization and clustering. In ICML,
page 1375, 2012.
G. Varoquaux, P. R. Raamana, D. A. Engemann, A. Hoyos-Idrobo, Y. Schwartz, and
B. Thirion. Assessing and tuning brain decoders: cross-validation, caveats, and
guidelines. NeuroImage, 145:166–179, 2017.
M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery
using 1-constrained quadratic programming. Trans Inf Theory, 55:2183, 2009.

Más contenido relacionado

La actualidad más candente

Lecture9 multi kernel_svm
Lecture9 multi kernel_svmLecture9 multi kernel_svm
Lecture9 multi kernel_svm
Stéphane Canu
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
Stéphane Canu
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic Splines
Mark Brandao
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Pooyan Jamshidi
 

La actualidad más candente (20)

Lecture9 multi kernel_svm
Lecture9 multi kernel_svmLecture9 multi kernel_svm
Lecture9 multi kernel_svm
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
 
My presentation at University of Nottingham "Fast low-rank methods for solvin...
My presentation at University of Nottingham "Fast low-rank methods for solvin...My presentation at University of Nottingham "Fast low-rank methods for solvin...
My presentation at University of Nottingham "Fast low-rank methods for solvin...
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
Approximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts modelApproximate Bayesian computation for the Ising/Potts model
Approximate Bayesian computation for the Ising/Potts model
 
Final Report-1-(1)
Final Report-1-(1)Final Report-1-(1)
Final Report-1-(1)
 
Natural and Clamped Cubic Splines
Natural and Clamped Cubic SplinesNatural and Clamped Cubic Splines
Natural and Clamped Cubic Splines
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...Developing fast  low-rank tensor methods for solving PDEs with uncertain coef...
Developing fast low-rank tensor methods for solving PDEs with uncertain coef...
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
 
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
Discrete Logarithm Problem over Prime Fields, Non-canonical Lifts and Logarit...
 
Gate-Cs 2006
Gate-Cs 2006Gate-Cs 2006
Gate-Cs 2006
 
Computational Linguistics week 5
Computational Linguistics  week 5Computational Linguistics  week 5
Computational Linguistics week 5
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 

Similar a A tutorial on Machine Learning, with illustrations for MR imaging

1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to
fariyaPatel
 
Modeling, Control and Optimization for Aerospace Systems
Modeling, Control and Optimization for Aerospace SystemsModeling, Control and Optimization for Aerospace Systems
Modeling, Control and Optimization for Aerospace Systems
Behzad Samadi
 

Similar a A tutorial on Machine Learning, with illustrations for MR imaging (20)

Polylogarithmic approximation algorithm for weighted F-deletion problems
Polylogarithmic approximation algorithm for weighted F-deletion problemsPolylogarithmic approximation algorithm for weighted F-deletion problems
Polylogarithmic approximation algorithm for weighted F-deletion problems
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
lecture15-regularization.pptx
lecture15-regularization.pptxlecture15-regularization.pptx
lecture15-regularization.pptx
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to1004_theorem_proving_2018.pptx on the to
1004_theorem_proving_2018.pptx on the to
 
Secure Domination in graphs
Secure Domination in graphsSecure Domination in graphs
Secure Domination in graphs
 
Bagging_and_Boosting.pptx
Bagging_and_Boosting.pptxBagging_and_Boosting.pptx
Bagging_and_Boosting.pptx
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the Trenches
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
Regression_1.pdf
Regression_1.pdfRegression_1.pdf
Regression_1.pdf
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
Virginia Smith, Researcher, UC Berkeley at MLconf SF 2016
 
Modeling, Control and Optimization for Aerospace Systems
Modeling, Control and Optimization for Aerospace SystemsModeling, Control and Optimization for Aerospace Systems
Modeling, Control and Optimization for Aerospace Systems
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
The probability of non-confluent systems
The probability of non-confluent systemsThe probability of non-confluent systems
The probability of non-confluent systems
 
New approaches for boosting to uniformity
New approaches for boosting to uniformityNew approaches for boosting to uniformity
New approaches for boosting to uniformity
 

Más de Gael Varoquaux

Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
Gael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Gael Varoquaux
 

Más de Gael Varoquaux (20)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovation
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
 
Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
 
Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016Scikit-learn: the state of the union 2016
Scikit-learn: the state of the union 2016
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Succeeding in academia despite doing good_software
Succeeding in academia despite doing good_softwareSucceeding in academia despite doing good_software
Succeeding in academia despite doing good_software
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

A tutorial on Machine Learning, with illustrations for MR imaging

  • 1. Introduction to Machine Learning in MR Imaging Gaël Varoquaux
  • 2. Introduction to Machine Learning in MR Imaging Gaël Varoquaux Outline: 1 The machine learning setting 2 A glance at a few models 3 Model evaluation 4 Learning on full-brain images 5 Learning on correlations in brain activity
  • 3. Speaker Name: Gaël Varoquaux I have no financial interests or relationships to disclose with regard to the subject matter of this presentation. Declaration of Financial Interests or Relationships
  • 4. 1 The machine learning setting Adjusting models for prediction G Varoquaux 3
  • 5. 1 Machine learning in a nutshell: an example Face recognition Andrew Bill Charles Dave G Varoquaux 4
  • 6. 1 Machine learning in a nutshell: an example Face recognition Andrew Bill Charles Dave G Varoquaux 4
  • 7. 1 Machine learning in a nutshell A simple method: 1 Store all the known (noisy) images and the names that go with them. 2 From a new (noisy) images, find the image that is most similar. “Nearest neighbor” method G Varoquaux 5
  • 8. 1 Machine learning in a nutshell A simple method: 1 Store all the known (noisy) images and the names that go with them. 2 From a new (noisy) images, find the image that is most similar. “Nearest neighbor” method How many errors on already-known images? ... 0: no errors Test data = Train data G Varoquaux 5
  • 9. 1 Machine learning in a nutshell A simple method: 1 Store all the known (noisy) images and the names that go with them. 2 From a new (noisy) images, find the image that is most similar. “Nearest neighbor” method How many errors on already-known images? ... 0: no errors Test data = Train data G Varoquaux 5
  • 10. 1 Machine learning in a nutshell: regression A single descriptor: 1 dimension x y G Varoquaux 6
  • 11. 1 Machine learning in a nutshell: regression A single descriptor: 1 dimension x y x y Which model to prefer? G Varoquaux 6
  • 12. 1 Machine learning in a nutshell: regression A single descriptor: 1 dimension x y x y Problem of “over-fitting” Minimizing error is not always the best strategy (learning noise) Test data = train data G Varoquaux 6
  • 13. 1 Machine learning in a nutshell: regression A single descriptor: 1 dimension x y x y Prefer simple models = concept of “regularization” Balance the number of parameters to learn with the amount of data G Varoquaux 6
  • 14. 1 Machine learning in a nutshell: regression A single descriptor: 1 dimension x y x y Prefer simple models = concept of “regularization” Balance the number of parameters to learn with the amount of data Bias variance tradeoff G Varoquaux 6
  • 15. 1 Machine learning in a nutshell: regression A single descriptor: 1 dimension x y Two descriptors: 2 dimensions X_1 X_2 y More parameters G Varoquaux 6
  • 16. 1 Machine learning in a nutshell: regression A single descriptor: 1 dimension x y Two descriptors: 2 dimensions X_1 X_2 y More parameters ⇒ Model with more parameters need much more data “curse of dimensionality” G Varoquaux 6
  • 17. 1 Some formalism: bias and regularization Settings: data (X, y), prediction y ∼ f (X, w) Our goal: minimizew E[ y − f (X, w) ] We only can measure y − f (X, w) G Varoquaux 7
  • 18. 1 Some formalism: bias and regularization Settings: data (X, y), prediction y ∼ f (X, w) Our goal: minimizew E[ y − f (X, w) ] We only can measure y − f (X, w) Prediction is very difficult, especially about the future. Niels Bohr G Varoquaux 7
  • 19. 1 Some formalism: bias and regularization Settings: data (X, y), prediction y ∼ f (X, w) Our goal: minimizew E[ y − f (X, w) ] We only can measure y − f (X, w) Solution: bias w to push toward a plausible solution In a minimization framework: minimizew y − f (X, w) + p(w) G Varoquaux 7
  • 20. 1 Probabilistic modeling: Bayesian point of view P(w|x, y) ∝ P(y, x|w) P(w) ( ) “Posterior” Quantity of interest Forward model “Prior” Expectations on w Forward model: y = f (x, w) + e, e: noise ⇒ P(x, y|w) ∝ exp −L(y − f (x, w)) Prior: P(w) ∝ exp −p(w) Negated log of ( ): L(y − f (X, w)) + p(w) Minimization framework = maximum a posteriori Note that the Bayesian interpretation may be more subtle [Gribonval 2011].G Varoquaux 8
  • 21. 1 Summary: elements of a machine-learning method A forward model: ypred = f (X, w) Numerical rules to go from X to y A loss, or data fit A measure of error between ytrue and ypred Can be given by a noise model Regularization: Any way of restricting model complexity - by choices in the model - via a penalty G Varoquaux 9
  • 22. 2 A glance at a few models Linear models Random forests Gradient boosted trees G Varoquaux 10
  • 23. 2 Classification: categorical variables y describes categories, say (0, 1) x y Fit y by a straight line? G Varoquaux 11
  • 24. 2 Classification: categorical variables y describes categories, say (0, 1) x y Fit y by a straight line? A sigmoid is better suited ⇒ Logistic regression G Varoquaux 11
  • 25. 2 Classification: categorical variables y describes categories, say (0, 1) x y Fit y by a straight line? A sigmoid is better suited ⇒ Logistic regression SVM (Support Vector Machines) are similar Note that points far from threshold don’t influence the fit G Varoquaux 11
  • 26. 2 Classification on multivariate data Choosing a separating hyperplane X_1 X_2 G Varoquaux 12
  • 27. 2 Classification on multivariate data: SVMs SVM: support vector machine X_1 X_2 Discriminative method: choose hyperplane to maximize margin G Varoquaux 12
  • 28. 2 Classification on multivariate data: SVMs SVM: support vector machine X_1 X_2 x y Discriminative method: choose hyperplane to maximize margin For this, use exemplars close to hyperplane (support vectors)G Varoquaux 12
  • 29. 2 With brain images Find a separating hyperplane between images G Varoquaux 13
  • 30. 2 With brain images Find a separating hyperplane between images SVM builds it by combining exemplars ⇒ hyperplane looks like train images G Varoquaux 13
  • 31. 2 Risk minimization and penalization Inverse problem Minimize the error term: ˆw = argmin w l(y − X w) Ill-posed: Many different w will give the same prediction error To choose one: inject prior with a penalty ˆw = argmin w l(y − X w) + p(w) G Varoquaux 14
  • 32. 2 Sparse models: selecting predictive voxels? Lasso estimator minx y − Ax 2 2 + λ 1(x) Data fit Penalization x2 x1 G Varoquaux 15
  • 33. 2 Sparse models: selecting predictive voxels? Lasso estimator minx y − Ax 2 2 + λ 1(x) Data fit Penalization x2 x1 Between correlated features, selects a random subset [Wainwright 2009, Varoquaux... 2012] Violates the restricted isometry property lasso: 23 non-zerosG Varoquaux 15
  • 34. 2 Underfit versus overfit Polynome degree 1 x y G Varoquaux 16
  • 35. 2 Underfit versus overfit Polynome degree 1 x y Polynome degree 9 x yG Varoquaux 16
  • 36. 2 Underfit versus overfit Polynome degree 1 x y Train explained variance: 0.56 Test explained variance: 0.56 Underfit Polynome degree 9 x y Train explained variance: 0.99 Test explained variance: -98000 Overfit G Varoquaux 16
  • 37. 2 Random forest 1 staircase Fit with a staircase of many values (tree) G Varoquaux 17
  • 38. 2 Random forest 1 staircase Fit with a staircase of many values (tree) Fit several on perturbed data G Varoquaux 17
  • 39. 2 Random forest 1 staircase Fit with a staircase of many values (tree) Fit several on perturbed data G Varoquaux 17
  • 40. 2 Random forest 1 staircase Fit with a staircase of many values (tree) Fit several on perturbed data G Varoquaux 17
  • 41. 2 Random forest 1 staircase Fit with a staircase of many values (tree) Fit several on perturbed data G Varoquaux 17
  • 42. 2 Random forest 1 staircase 10 staircases averaged Fit with a staircase of many values (tree) Fit several on perturbed data Average prediction G Varoquaux 17
  • 43. 2 Random forest 1 staircase 10 staircases averaged 50 staircases averaged Fit with a staircase of many values (tree) Fit several on perturbed data Average prediction G Varoquaux 17
  • 44. 2 Random forest 1 staircase 10 staircases averaged 50 staircases averaged 300 staircases averaged Fit with a staircase of many values (tree) Fit several on perturbed data Average prediction G Varoquaux 17
  • 45. 2 Random forest 1 staircase 10 staircases averaged 50 staircases averaged 300 staircases averaged Fit with a staircase of many values (tree) Fit several on perturbed data Average prediction Forest = ensemble of trees Bagging = boostrap aggregating Applied to models that overfit G Varoquaux 17
  • 46. 2 Gradient boosted trees 1 staircase Fit with a staircase of 10 constant values (tree) G Varoquaux 18
  • 47. 2 Gradient boosted trees 1 staircase 2 staircases combined Fit with a staircase of 10 constant values (tree) Fit a new staircase on errors G Varoquaux 18
  • 48. 2 Gradient boosted trees 1 staircase 2 staircases combined 3 staircases combined Fit with a staircase of 10 constant values (tree) Fit a new staircase on errors Keep going G Varoquaux 18
  • 49. 2 Gradient boosted trees 1 staircase 2 staircases combined 3 staircases combined 300 staircases combined Fit with a staircase of 10 constant values (tree) Fit a new staircase on errors Keep going G Varoquaux 18
  • 50. 2 Gradient boosted trees 1 staircase 2 staircases combined 3 staircases combined 300 staircases combined Fit with a staircase of 10 constant values (tree) Fit a new staircase on errors Keep going “Boosted” regression trees Boosting = iteratively fit models on error Applied to models that underfit G Varoquaux 18
  • 51. 2 Gradient boosted trees 1 staircase 2 staircases combined 3 staircases combined 300 staircases combined Fit with a staircase of 10 constant values (tree) Fit a new staircase on errors Keep going “Boosted” regression trees Boosting = iteratively fit models on error Applied to models that underfit Complexity trade offs Computational + statistical G Varoquaux 18
  • 52. 3 Model evaluation Does it predict? G Varoquaux 19
  • 53. 3 Generalization as a test: cross-validation x y x y G Varoquaux 20
  • 54. 3 Generalization as a test: cross-validation x y x y ⇒ Need test on independent data, to control for model complexity Train set Validation set Measures prediction accuracy [Varoquaux... 2017] G Varoquaux 20
  • 55. 3 Generalization as a test: cross-validation x y x y ⇒ Need test on independent data, Loop Test setTrain set Full data G Varoquaux 20
  • 56. 3 Generalization as a test: cross-validation x y x y ⇒ Need test on independent data, Nested loop Validation set Full data Test setTrain set Nested loop Outer loop G Varoquaux 20
  • 57. 3 Confounds & dependencies time X1 Prediction is easy if you look at the last time point G Varoquaux 21
  • 58. 3 Confounds & dependencies time X1X2 Prediction is easy if you look at the last time point Auto-correlated noise Neighboring data points not independent No evidence of generalization No evidence of useful prediction Validate on independent data Not 2 images of the same subject Unless it’s a longitudinal study [Varoquaux... 2017, Little... 2017] G Varoquaux 21
  • 59. 3 Uncertainty of cross-validation [Varoquaux 2017] 100 200 er of available samples    ­19% +15% ­10% +8% ­10% +10% ­7% +5% ­7% +7% ­6% +6% LOO LOO 50 splits, 20% test LOO 50 splits, 20% test 50 splits, 20% test50 splits, 20% test Distribution of prediction accuracies across 50 folds Should be seen as a posterior distribution (or bootstrap) = folds are not independent measures No statistical tests (eg T tests) on folds G Varoquaux 22
  • 60. 3 Uncertainty of cross-validation [Varoquaux 2017] 100 200 300 1000 Number of available samples    ­19% +15% ­10% +8% ­10% +10% ­7% +5% ­7% +7% ­5% +4% ­6% +6% ­3% +2% ­3% +3% LOO LOO 50 splits, 20% test LOO 50 splits, 20% test LOO 50 splits, 20% test 50 splits, 20% test50 splits, 20% test G Varoquaux 22
  • 61. 3 Uncertainty of cross-validation [Varoquaux 2017] 100 200 300 ber of available samples    ­19% +15% ­10% +8% ­10% +10% ­7% +5% ­7% +7% ­5% +4% ­6% +6% LOO LOO 50 splits, 20% test LOO 50 splits, 20% test 50 splits, 20% test50 splits, 20% test Sample size Confidence interval (binomial model) 100 18.0% 1 000 5.7% 10 000 1.8% 100 000 0.568% Sampling noise – Decreases slowly with n G Varoquaux 22
  • 62. 3 Uncertainty of cross-validation [Varoquaux 2017] ­45% ­30% ­15%  0% +15% +30% Difference between public and private scores        ­15% +14% Kaggle competition on r-fMRI for Schizophrenia 2 different test sets: size 30 and 28 Winner of competition: first code written G Varoquaux 22
  • 63. 4 Learning on full-brain images Personal focus on functional imaging G Varoquaux 23
  • 64. 4 Linear models on brain images Design matrix × Coefficients = Coefficients are brain maps Target G Varoquaux 24
  • 65. 4 Finding the predictive regions? Face vs house visual recognition [Haxby... 2001] SVM error: 26% G Varoquaux 25
  • 66. 4 Finding the predictive regions? Face vs house visual recognition [Haxby... 2001] Sparse model error: 19% G Varoquaux 25
  • 67. 4 Finding the predictive regions? Face vs house visual recognition [Haxby... 2001] Ridge error: 15% G Varoquaux 25
  • 68. 4 Estimating the linear model [Gramfort... 2013] Inverse problem Minimize the error term: ˆw = argmin w l(y − X w) Ill-posed: Many different w will give the same prediction error Choice driven by (implicit) priors SVM sparse ridge TV- 1 G Varoquaux 26
  • 69. 4 Estimating the linear model [Gramfort... 2013] Inverse problem Minimize the error term: ˆw = argmin w l(y − X w) Ill-posed: Many different w will give the same prediction error Choice driven by (implicit) priors SVM sparse ridge TV- 1 Sparse estimators (Lasso) minx y − Ax 2 2 + λ 1(x) Data fit Penalization x2 x1G Varoquaux 26
  • 70. 4 Estimating the linear model [Gramfort... 2013] Inverse problem Minimize the error term: ˆw = argmin w l(y − X w) Ill-posed: Many different w will give the same prediction error Choice driven by (implicit) priors SVM sparse ridge TV- 1 Total-variation penalization Impose sparsity on the gradient of the image: p(w) = 1( w) In fMRI: [Michel... 2011] G Varoquaux 26
  • 71. 4 Estimating the linear model [Gramfort... 2013] Inverse problem Minimize the error term: ˆw = argmin w l(y − X w) Ill-posed: Many different w will give the same prediction error Choice driven by (implicit) priors SVM sparse ridge TV- 1 G Varoquaux 26
  • 72. 4 Tricks of the trade Spatial smoothing 12mm Giving up on resolution error rate: 43% 31% Feature selection 23% Giving up on multivariate G Varoquaux 27
  • 73. 4 FREM: Ensemble models [Hoyos-Idrobo... 2017] Very fast sub optimal models Average many of them G Varoquaux 28
  • 74. 4 FREM: Ensemble models [Hoyos-Idrobo... 2017] Very fast sub optimal models Average many of them ... ... Learn parcellation on perturbed data Estimate linear models Average the results G Varoquaux 28
  • 75. 4 FREM: Ensemble models [Hoyos-Idrobo... 2017] Very fast sub optimal models Average many of them G Varoquaux 28
  • 76. 4 FREM: Ensemble models [Hoyos-Idrobo... 2017] Very fast sub optimal models Average many of them G Varoquaux 28
  • 77. 4 FREM: Ensemble models [Hoyos-Idrobo... 2017] Very fast sub optimal models Average many of them G Varoquaux 28
  • 78. 5 Learning on correlations in brain activity “connectome” prediction Brain “at rest” = easier to scan G Varoquaux 29
  • 79. From rest-fMRI to biomarkers No salient features in rest fMRI G Varoquaux 30
  • 80. From rest-fMRI to biomarkers Define functional regions G Varoquaux 30
  • 81. From rest-fMRI to biomarkers Define functional regions Learn interactions G Varoquaux 30
  • 82. From rest-fMRI to biomarkers Define functional regions Learn interactions Detect differences G Varoquaux 30
  • 83. From rest-fMRI to biomarkers Functional connectivity matrix Time series extraction Region definition Supervised learning RS-fMRI G Varoquaux 31
  • 84. From rest-fMRI to biomarkers Functional connectivity matrix Time series extraction Region definition Supervised learning RS-fMRI Unsupervised learning ⇒ adapted representations dictionary learning G Varoquaux 31
  • 85. From rest-fMRI to biomarkers Functional connectivity matrix Time series extraction Region definition Supervised learning RS-fMRI 1Tb data 50Gb data Unsupervised learning ⇒ adapted representations dictionary learning More data is always better computational cost [Mensch... 2016]G Varoquaux 31
  • 86. 5 Biomarkers, heterogeneity, & big data: Autism study [Abraham... 2017] Ill-defined pathology Heterogeneous dataset (ABIDE) As number of subjects increases Accuracy Fraction of subjects used Heterogeneity is not the limiting factor G Varoquaux 32
  • 87. @GaelVaroquaux An introduction to Machine Learning for MR imaging Overfit versus underfit The best fit is not the best performer Minimizing an observed error is the wrong solution xy
  • 88. @GaelVaroquaux An introduction to Machine Learning for MR imaging Overfit versus underfit Evaluate on independent data Cross-validation error bars
  • 89. @GaelVaroquaux An introduction to Machine Learning for MR imaging Overfit versus underfit Evaluate on independent data More data trumps being clever
  • 90. @GaelVaroquaux An introduction to Machine Learning for MR imaging Overfit versus underfit Evaluate on independent data More data trumps being clever Software: scikit-learn & nilearn In Python http://scikit-learn.org http://nilearn.github.io ni
  • 91. References I A. Abraham, M. P. Milham, A. Di Martino, R. C. Craddock, D. Samaras, B. Thirion, and G. Varoquaux. Deriving reproducible biomarkers from multi-site resting-state data: An autism-based example. NeuroImage, 147:736–745, 2017. A. Gramfort, B. Thirion, and G. Varoquaux. Identifying predictive regions from fMRI with TV-L1 prior. In PRNI, page 17, 2013. R. Gribonval. Should penalized least squares regression be interpreted as maximum a posteriori estimation? IEEE Transactions on Signal Processing, 59(5):2405–2410, 2011. J. V. Haxby, I. M. Gobbini, M. L. Furey, ... Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293:2425, 2001. A. Hoyos-Idrobo, G. Varoquaux, Y. Schwartz, and B. Thirion. Frem – scalable and stable decoding with fast regularized ensemble of models. NeuroImage, 2017.
  • 92. References II M. A. Little, G. Varoquaux, S. Saeb, L. Lonini, A. Jayaraman, D. C. Mohr, and K. P. Kording. Using and understanding cross-validation strategies. perspectives on saeb et al. GigaScience, 6(5):1–6, 2017. A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux. Dictionary learning for massive matrix factorization. In International Conference on Machine Learning, pages 1737–1746, 2016. V. Michel, A. Gramfort, G. Varoquaux, E. Eger, and B. Thirion. Total variation regularization for fMRI-based prediction of behavior. Medical Imaging, IEEE Transactions on, 30:1328, 2011. S. M. Smith, T. E. Nichols, D. Vidaurre, A. M. Winkler, T. E. Behrens, M. F. Glasser, K. Ugurbil, D. M. Barch, D. C. Van Essen, and K. L. Miller. A positive-negative mode of population covariation links brain connectivity, demographics and behavior. Nature neuroscience, 18(11):1565–1567, 2015. G. Varoquaux. Cross-validation failure: small sample sizes lead to large error bars. NeuroImage, 2017.
  • 93. References III G. Varoquaux and B. Thirion. How machine learning is shaping cognitive neuroimaging. GigaScience, 3:28, 2014. G. Varoquaux, A. Gramfort, and B. Thirion. Small-sample brain mapping: sparse recovery on spatially correlated designs with randomization and clustering. In ICML, page 1375, 2012. G. Varoquaux, P. R. Raamana, D. A. Engemann, A. Hoyos-Idrobo, Y. Schwartz, and B. Thirion. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines. NeuroImage, 145:166–179, 2017. M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming. Trans Inf Theory, 55:2183, 2009.