Decoding, MVPA, predictive models for neuroimaging diagnosis or prognosis all rely on cross-validation to measure the predictive accuracy of the model, and optionally to tune the decoder. Cross-validation relies on testing predictive power on left-out data unseen by the training of the predictive model. It is appealing as it is non-parametric and asymptotically unbiased. Common practice in neuroimaging relies on leave-one-out, yet statistical theory [1] suggests that this is suboptimal as the small test set leads to large variance and is easily biased by sample correlations.
Decoders usually come with a hyper-parameter that controls the regularization, ie a bias/variance tradeoff. In machine learning, this tradeoff is typically adjusted to the signal-to-noise ratio of the data using cross-validation to maximize predictive accuracy. In this case, the accuracy of the decoding must be done in an independent "validation set", using a "nested cross-validation.
Here we assess empirically these practices on neuroimaging data, to yield guidelines.
# Methods
Given 8 open datasets from openfMRI [2], we assess cross-validation on 35 decoding tasks, 15 of which within subject. We leave out a large validation untouched and perform nested cross-validation in the rest of the data. In a first experiment we compared the accuracy of the decoder as measured by cross-validation with that measured on the left-out data. In a second experiment, we use the nested cross-validation to tune the decoders either by refitting on with the best parameter or by averaging the best models. We used standard linear decoders: SVM and logistic regression, both sparse (l1 penalty) and non-sparse (l2 penatly).
We assess a variety of cross-validation strategy: leaving out single samples, leaving out full sessions or subjects, and repeated random splits leaving out 20% of the sessions of subjects.
# Conclusions
The first finding is a confirmation of the theory that repeated random splits should be preferred to leave-one-sample-out: they are less fragile, and less computationally costly.
Second, we find large error bars on cross-validation estimates of predictive power, 10% or more, particularly on within-subject analysis, likely because of marked sample inhomogeneities.
Finally, we find that setting decoder parameters by nested cross-validation does not lead to much prediction gain, in particular in the case of non-sparse models. This is probably a consequence of our second finding.
These conclusions are crucial for decoding and information mapping, that rely on the measure of the prediction accuracy. This measure is more fragile than practitioners often assume.
Cross-validation to assess decoder performance: the good, the bad, and the ugly
1. Cross-validation to assess decoder performance:
the good, the bad, and the ugly
Gaƫl Varoquaux
https://hal.archives-ouvertes.fr/hal-01332785
2. Measuring prediction accuracy
To ļ¬nd the best method
(computer scientists)
For information mapping = omnibus test
(cognitive neuroimaging)
Cross-validation
asymptotically unbiased
non parametric
G Varoquaux 2
3. 1 Some theory
2 Empirical results on brain imaging
G Varoquaux 3
6. 1 Cross-validation
Test on independent data
Train set Validation
set
Loop
Test setTrain set
Full data
Measures prediction accuracy
G Varoquaux 5
7. 1 Choice of cross-validation strategy
Test on independent data
Be robust to confounding dependences
Leave subjects out, or sessions out
Loop
More loop = more data points
Need to balance error in training model
/ error on test
G Varoquaux 6
8. 1 Choice of cross-validation strategy: theory
Negative bias (underestimate performance)
decreasing with the size of the training set
[Arlot... 2010] sec.5.1
Variance decreases with the size of the test set
[Arlot... 2010] sec.5.2
Fraction of data left out: 10ā20%
Many random splits of the data
respecting dependency structure
G Varoquaux 7
10. 1 Tuning hyper-parameters
Computer scientist says:
You need to set C in your SVM
10-4
10-3
10-2
10-1
100
101
102
103
104
Parameter tuning: C
Training set
Validation set
G Varoquaux 8
11. 1 Nested cross-validation
Test on independent data
Train set Validation
set
Two loops
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 9
12. 2 Empirical results on brain
imaging
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 10
13. 2 Datasets and tasks
7 fMRI datasets (6 from openfMRI)
Haxby: 5 subjects, 15 inter-subject predictions
Inter-subject predictions on 6 studies
OASIS VBM, gender discrimination
HCP MEG task, intra-subject, working memory
# samples: ā¼ 200 (min 80, max 400)
accuracy min 62%, max 96%
G Varoquaux 11
14. 2 Experiment 1: measuring cross-validation error
Leave out a large validation set
Measure error by cross-validation on the rest
Compare
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 12
15. 2 Cross-validated measure versus validation set
50.0% 60.0% 70.0% 80.0% 90.0% 100.0%
AccuracyĀ onĀ validationĀ set
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
AccuracyĀ
measuredĀ byĀ crossĀvalidation
IntraĀ subject
InterĀ subject
G Varoquaux 13
16. 2 Diļ¬erent cross-validation strategies
Cross-validation Diļ¬erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
22% +19%
+3% +43%
Intra
subject
Inter
subject
G Varoquaux 14
17. 2 Diļ¬erent cross-validation strategies
Cross-validation Diļ¬erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
Leave one
subject/session
22% +19%
+3% +43%
10% +10%
21% +17%
Intra
subject
Inter
subject
G Varoquaux 14
18. 2 Diļ¬erent cross-validation strategies
Cross-validation Diļ¬erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
Leave one
subject/session
20% left out,
3 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
Intra
subject
Inter
subject
G Varoquaux 14
19. 2 Diļ¬erent cross-validation strategies
Cross-validation Diļ¬erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
Leave one
subject/session
20% left out,
3 splits
20% left out,
10 splits
20% left out,
50 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
9% +9%
24% +14%
9% +8%
23% +13%
Intra
subject
Inter
subject
G Varoquaux 14
22. 2 Diļ¬erent cross-validation strategies
Cross-validation Diļ¬erence in accuracy measured
strategy by cross-validation and on validation set
Ā40% Ā20% Ā10% Ā 0% +10% +20% +40%
LeaveĀ one
sampleĀ out
LeaveĀ one
blockĀ out
20%Ā leftĀout,Ā
Ā 3Ā splits
20%Ā leftĀout,Ā
Ā 10Ā splits
20%Ā leftĀout,Ā
Ā 50Ā splits
Ā16% +14%
+4% +33%
Ā15% +13%
Ā8% +8%
Ā15% +12%
Ā10% +11%
Ā13% +10%
Ā8% +8%
Ā12% +10%
Ā7% +7%
MEGĀ data
Simulations
G Varoquaux 16
23. 2 Experiment 2: parameter-tuning
Compare diļ¬erent strategies on validation set:
1. Use the default C = 1
2. Use C = 1000
3. Choose best C by cross-validation and reļ¬t
3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 17
24. 2 Experiment 2: parameter-tuning
Compare diļ¬erent strategies on validation set:
1. Use the default C = 1
2. Use C = 1000
3. Choose best C by cross-validation and reļ¬t
3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
Non-sparse decoders
SVM 2
Log-reg 2
Sparse decoders
SVM 1
Log-reg 1
G Varoquaux 17
28. @GaelVaroquaux
Cross-validation: lessons learned
Donāt use Leave One Out
Random 10-20% splits respecting sample structure
Cross-validation has error bars of Ā±10%
Cross-validation is ineļ¬cient for parameter tuning
- C = 1 for SVM- 2
- model averaging for SVM- 1
29. @GaelVaroquaux
Cross-validation: lessons learned
Donāt use Leave One Out
Random 10-20% splits respecting sample structure
Cross-validation has error bars of Ā±10%
Cross-validation is ineļ¬cient for parameter tuning
- C = 1 for SVM- 2
- model averaging for SVM- 1
https://hal.archives-ouvertes.fr/hal-01332785
ni
30. References I
S. Arlot, A. Celisse, ... A survey of cross-validation procedures for
model selection. Statistics surveys, 4:40ā79, 2010.