SlideShare a Scribd company logo
1 of 30
Download to read offline
Cross-validation to assess decoder performance:
the good, the bad, and the ugly
Gaƫl Varoquaux
https://hal.archives-ouvertes.fr/hal-01332785
Measuring prediction accuracy
To ļ¬nd the best method
(computer scientists)
For information mapping = omnibus test
(cognitive neuroimaging)
Cross-validation
asymptotically unbiased
non parametric
G Varoquaux 2
1 Some theory
2 Empirical results on brain imaging
G Varoquaux 3
1 Some theory
Test setTrain set
Full data
G Varoquaux 4
1 Cross-validation
Test on independent data
Train set Validation
set
G Varoquaux 5
1 Cross-validation
Test on independent data
Train set Validation
set
Loop
Test setTrain set
Full data
Measures prediction accuracy
G Varoquaux 5
1 Choice of cross-validation strategy
Test on independent data
Be robust to confounding dependences
Leave subjects out, or sessions out
Loop
More loop = more data points
Need to balance error in training model
/ error on test
G Varoquaux 6
1 Choice of cross-validation strategy: theory
Negative bias (underestimate performance)
decreasing with the size of the training set
[Arlot... 2010] sec.5.1
Variance decreases with the size of the test set
[Arlot... 2010] sec.5.2
Fraction of data left out: 10ā€“20%
Many random splits of the data
respecting dependency structure
G Varoquaux 7
1 Tuning hyper-parameters
Computer scientist says:
You need to set C in your SVM
G Varoquaux 8
1 Tuning hyper-parameters
Computer scientist says:
You need to set C in your SVM
10-4
10-3
10-2
10-1
100
101
102
103
104
Parameter tuning: C
Training set
Validation set
G Varoquaux 8
1 Nested cross-validation
Test on independent data
Train set Validation
set
Two loops
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 9
2 Empirical results on brain
imaging
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 10
2 Datasets and tasks
7 fMRI datasets (6 from openfMRI)
Haxby: 5 subjects, 15 inter-subject predictions
Inter-subject predictions on 6 studies
OASIS VBM, gender discrimination
HCP MEG task, intra-subject, working memory
# samples: āˆ¼ 200 (min 80, max 400)
accuracy min 62%, max 96%
G Varoquaux 11
2 Experiment 1: measuring cross-validation error
Leave out a large validation set
Measure error by cross-validation on the rest
Compare
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 12
2 Cross-validated measure versus validation set
50.0% 60.0% 70.0% 80.0% 90.0% 100.0%
AccuracyĀ onĀ validationĀ set
50.0%
60.0%
70.0%
80.0%
90.0%
100.0%
AccuracyĀ 
measuredĀ byĀ crossĀ­validation
IntraĀ subject
InterĀ subject
G Varoquaux 13
2 Diļ¬€erent cross-validation strategies
Cross-validation Diļ¬€erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
22% +19%
+3% +43%
Intra
subject
Inter
subject
G Varoquaux 14
2 Diļ¬€erent cross-validation strategies
Cross-validation Diļ¬€erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
Leave one
subject/session
22% +19%
+3% +43%
10% +10%
21% +17%
Intra
subject
Inter
subject
G Varoquaux 14
2 Diļ¬€erent cross-validation strategies
Cross-validation Diļ¬€erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
Leave one
subject/session
20% left out,
3 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
Intra
subject
Inter
subject
G Varoquaux 14
2 Diļ¬€erent cross-validation strategies
Cross-validation Diļ¬€erence in accuracy measured
strategy by cross-validation and on validation set
40% 20% 10% 0% +10% +20% +40%
Leave one
sample out
Leave one
subject/session
20% left out,
3 splits
20% left out,
10 splits
20% left out,
50 splits
22% +19%
+3% +43%
10% +10%
21% +17%
11% +11%
24% +16%
9% +9%
24% +14%
9% +8%
23% +13%
Intra
subject
Inter
subject
G Varoquaux 14
2 Simple simulations
X1
X2
time
X1
2 Gaussian-separated
clouds
Auto-correlated noise
200 decoding samples
10 000 validation samples
ā‡’ Validation
= assymptotics
G Varoquaux 15
2 Simple simulations
X1
X2
time
X1
X1
X2
time
X1
G Varoquaux 15
2 Diļ¬€erent cross-validation strategies
Cross-validation Diļ¬€erence in accuracy measured
strategy by cross-validation and on validation set
Ā­40% Ā­20% Ā­10% Ā 0% +10% +20% +40%
LeaveĀ one
sampleĀ out
LeaveĀ one
blockĀ out
20%Ā leftĀ­out,Ā 
Ā 3Ā splits
20%Ā leftĀ­out,Ā 
Ā 10Ā splits
20%Ā leftĀ­out,Ā 
Ā 50Ā splits
Ā­16% +14%
+4% +33%
Ā­15% +13%
Ā­8% +8%
Ā­15% +12%
Ā­10% +11%
Ā­13% +10%
Ā­8% +8%
Ā­12% +10%
Ā­7% +7%
MEGĀ data
Simulations
G Varoquaux 16
2 Experiment 2: parameter-tuning
Compare diļ¬€erent strategies on validation set:
1. Use the default C = 1
2. Use C = 1000
3. Choose best C by cross-validation and reļ¬t
3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
G Varoquaux 17
2 Experiment 2: parameter-tuning
Compare diļ¬€erent strategies on validation set:
1. Use the default C = 1
2. Use C = 1000
3. Choose best C by cross-validation and reļ¬t
3. Average best models in cross-validation
Validation set
Full data
Test setTrain set
Nested loop
Outer loop
Non-sparse decoders
SVM 2
Log-reg 2
Sparse decoders
SVM 1
Log-reg 1
G Varoquaux 17
2 Cross-validation for tuning?
CVĀ +Ā 
averaging CVĀ +Ā 
refitting C=1
C=1000
Ā­8%
Ā­4%
Ā­2%
0%
+2%
+4%
+8%
ImpactĀ onĀ predictionĀ accuracy
SVM
logĀ­reg
ā‡“
CVĀ +Ā 
averaging CVĀ +Ā 
refitting C=1
C=1000
Ā­8%
Ā­4%
Ā­2%
0%
+2%
+4%
+8%
ImpactĀ onĀ predictionĀ accuracy
SVM
logĀ­reg
ā‡‘
Non-sparse models Sparse models
G Varoquaux 18
@GaelVaroquaux
Cross-validation: lessons learned
Donā€™t use Leave One Out
Random 10-20% splits respecting sample structure
@GaelVaroquaux
Cross-validation: lessons learned
Donā€™t use Leave One Out
Random 10-20% splits respecting sample structure
Cross-validation has error bars of Ā±10%
@GaelVaroquaux
Cross-validation: lessons learned
Donā€™t use Leave One Out
Random 10-20% splits respecting sample structure
Cross-validation has error bars of Ā±10%
Cross-validation is ineļ¬ƒcient for parameter tuning
- C = 1 for SVM- 2
- model averaging for SVM- 1
@GaelVaroquaux
Cross-validation: lessons learned
Donā€™t use Leave One Out
Random 10-20% splits respecting sample structure
Cross-validation has error bars of Ā±10%
Cross-validation is ineļ¬ƒcient for parameter tuning
- C = 1 for SVM- 2
- model averaging for SVM- 1
https://hal.archives-ouvertes.fr/hal-01332785
ni
References I
S. Arlot, A. Celisse, ... A survey of cross-validation procedures for
model selection. Statistics surveys, 4:40ā€“79, 2010.

More Related Content

Viewers also liked

A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionGael Varoquaux
Ā 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisGael Varoquaux
Ā 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsGael Varoquaux
Ā 
Le temps rƩel au coeur de toutes les stratƩgies digitales
Le temps rƩel au coeur de toutes les stratƩgies digitales Le temps rƩel au coeur de toutes les stratƩgies digitales
Le temps rƩel au coeur de toutes les stratƩgies digitales Netwave
Ā 
Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014LoĆÆc Knuchel
Ā 
The good the bad and the ugly - final
The good the bad and the ugly - finalThe good the bad and the ugly - final
The good the bad and the ugly - finalAndre Verschelling
Ā 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahoutGregg Barrett
Ā 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsityGael Varoquaux
Ā 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific SoftwareGael Varoquaux
Ā 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsGael Varoquaux
Ā 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonGael Varoquaux
Ā 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoopYONG ZHENG
Ā 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonGael Varoquaux
Ā 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutAmbarish Hazarnis
Ā 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validationguestfee8698
Ā 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
Ā 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetGael Varoquaux
Ā 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using MahoutIMC Institute
Ā 
Learning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivityLearning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivityGael Varoquaux
Ā 
Cicret bracelet
Cicret braceletCicret bracelet
Cicret braceletAthul Anand
Ā 

Viewers also liked (20)

A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
Ā 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysis
Ā 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
Ā 
Le temps rƩel au coeur de toutes les stratƩgies digitales
Le temps rƩel au coeur de toutes les stratƩgies digitales Le temps rƩel au coeur de toutes les stratƩgies digitales
Le temps rƩel au coeur de toutes les stratƩgies digitales
Ā 
Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014Des maths et des recommandations - Devoxx 2014
Des maths et des recommandations - Devoxx 2014
Ā 
The good the bad and the ugly - final
The good the bad and the ugly - finalThe good the bad and the ugly - final
The good the bad and the ugly - final
Ā 
Example: movielens data with mahout
Example: movielens data with mahoutExample: movielens data with mahout
Example: movielens data with mahout
Ā 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
Ā 
Open Source Scientific Software
Open Source Scientific SoftwareOpen Source Scientific Software
Open Source Scientific Software
Ā 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis Methods
Ā 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
Ā 
Slope one recommender on hadoop
Slope one recommender on hadoopSlope one recommender on hadoop
Slope one recommender on hadoop
Ā 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
Ā 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache Mahout
Ā 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
Ā 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
Ā 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
Ā 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
Ā 
Learning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivityLearning and comparing multi-subject models of brain functional connecitivity
Learning and comparing multi-subject models of brain functional connecitivity
Ā 
Cicret bracelet
Cicret braceletCicret bracelet
Cicret bracelet
Ā 

More from Gael Varoquaux

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueGael Varoquaux
Ā 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingGael Varoquaux
Ā 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing valuesGael Varoquaux
Ā 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataGael Varoquaux
Ā 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settingsGael Varoquaux
Ā 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Gael Varoquaux
Ā 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Gael Varoquaux
Ā 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingGael Varoquaux
Ā 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesGael Varoquaux
Ā 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomesGael Varoquaux
Ā 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingGael Varoquaux
Ā 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
Ā 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingGael Varoquaux
Ā 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingGael Varoquaux
Ā 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible scienceGael Varoquaux
Ā 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovationGael Varoquaux
Ā 
Estimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and LimitationsEstimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and LimitationsGael Varoquaux
Ā 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
Ā 
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...Gael Varoquaux
Ā 

More from Gael Varoquaux (19)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
Ā 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
Ā 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
Ā 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
Ā 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
Ā 
Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
Ā 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
Ā 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
Ā 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
Ā 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
Ā 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
Ā 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Ā 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
Ā 
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imagingScikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Ā 
Computational practices for reproducible science
Computational practices for reproducible scienceComputational practices for reproducible science
Computational practices for reproducible science
Ā 
Coding for science and innovation
Coding for science and innovationCoding for science and innovation
Coding for science and innovation
Ā 
Estimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and LimitationsEstimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Estimating Functional Connectomes: Sparsityā€™s Strength and Limitations
Ā 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
Ā 
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Scikit-learn: apprentissage statistique en Python. CrƩer des machines intelli...
Ā 

Recently uploaded

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĆŗjo
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Ā 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Ā 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
Ā 

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Ā 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Ā 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Ā 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 

Cross-validation to assess decoder performance: the good, the bad, and the ugly

  • 1. Cross-validation to assess decoder performance: the good, the bad, and the ugly GaĆ«l Varoquaux https://hal.archives-ouvertes.fr/hal-01332785
  • 2. Measuring prediction accuracy To ļ¬nd the best method (computer scientists) For information mapping = omnibus test (cognitive neuroimaging) Cross-validation asymptotically unbiased non parametric G Varoquaux 2
  • 3. 1 Some theory 2 Empirical results on brain imaging G Varoquaux 3
  • 4. 1 Some theory Test setTrain set Full data G Varoquaux 4
  • 5. 1 Cross-validation Test on independent data Train set Validation set G Varoquaux 5
  • 6. 1 Cross-validation Test on independent data Train set Validation set Loop Test setTrain set Full data Measures prediction accuracy G Varoquaux 5
  • 7. 1 Choice of cross-validation strategy Test on independent data Be robust to confounding dependences Leave subjects out, or sessions out Loop More loop = more data points Need to balance error in training model / error on test G Varoquaux 6
  • 8. 1 Choice of cross-validation strategy: theory Negative bias (underestimate performance) decreasing with the size of the training set [Arlot... 2010] sec.5.1 Variance decreases with the size of the test set [Arlot... 2010] sec.5.2 Fraction of data left out: 10ā€“20% Many random splits of the data respecting dependency structure G Varoquaux 7
  • 9. 1 Tuning hyper-parameters Computer scientist says: You need to set C in your SVM G Varoquaux 8
  • 10. 1 Tuning hyper-parameters Computer scientist says: You need to set C in your SVM 10-4 10-3 10-2 10-1 100 101 102 103 104 Parameter tuning: C Training set Validation set G Varoquaux 8
  • 11. 1 Nested cross-validation Test on independent data Train set Validation set Two loops Validation set Full data Test setTrain set Nested loop Outer loop G Varoquaux 9
  • 12. 2 Empirical results on brain imaging Validation set Full data Test setTrain set Nested loop Outer loop G Varoquaux 10
  • 13. 2 Datasets and tasks 7 fMRI datasets (6 from openfMRI) Haxby: 5 subjects, 15 inter-subject predictions Inter-subject predictions on 6 studies OASIS VBM, gender discrimination HCP MEG task, intra-subject, working memory # samples: āˆ¼ 200 (min 80, max 400) accuracy min 62%, max 96% G Varoquaux 11
  • 14. 2 Experiment 1: measuring cross-validation error Leave out a large validation set Measure error by cross-validation on the rest Compare Validation set Full data Test setTrain set Nested loop Outer loop G Varoquaux 12
  • 15. 2 Cross-validated measure versus validation set 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% AccuracyĀ onĀ validationĀ set 50.0% 60.0% 70.0% 80.0% 90.0% 100.0% AccuracyĀ  measuredĀ byĀ crossĀ­validation IntraĀ subject InterĀ subject G Varoquaux 13
  • 16. 2 Diļ¬€erent cross-validation strategies Cross-validation Diļ¬€erence in accuracy measured strategy by cross-validation and on validation set 40% 20% 10% 0% +10% +20% +40% Leave one sample out 22% +19% +3% +43% Intra subject Inter subject G Varoquaux 14
  • 17. 2 Diļ¬€erent cross-validation strategies Cross-validation Diļ¬€erence in accuracy measured strategy by cross-validation and on validation set 40% 20% 10% 0% +10% +20% +40% Leave one sample out Leave one subject/session 22% +19% +3% +43% 10% +10% 21% +17% Intra subject Inter subject G Varoquaux 14
  • 18. 2 Diļ¬€erent cross-validation strategies Cross-validation Diļ¬€erence in accuracy measured strategy by cross-validation and on validation set 40% 20% 10% 0% +10% +20% +40% Leave one sample out Leave one subject/session 20% left out, 3 splits 22% +19% +3% +43% 10% +10% 21% +17% 11% +11% 24% +16% Intra subject Inter subject G Varoquaux 14
  • 19. 2 Diļ¬€erent cross-validation strategies Cross-validation Diļ¬€erence in accuracy measured strategy by cross-validation and on validation set 40% 20% 10% 0% +10% +20% +40% Leave one sample out Leave one subject/session 20% left out, 3 splits 20% left out, 10 splits 20% left out, 50 splits 22% +19% +3% +43% 10% +10% 21% +17% 11% +11% 24% +16% 9% +9% 24% +14% 9% +8% 23% +13% Intra subject Inter subject G Varoquaux 14
  • 20. 2 Simple simulations X1 X2 time X1 2 Gaussian-separated clouds Auto-correlated noise 200 decoding samples 10 000 validation samples ā‡’ Validation = assymptotics G Varoquaux 15
  • 22. 2 Diļ¬€erent cross-validation strategies Cross-validation Diļ¬€erence in accuracy measured strategy by cross-validation and on validation set Ā­40% Ā­20% Ā­10% Ā 0% +10% +20% +40% LeaveĀ one sampleĀ out LeaveĀ one blockĀ out 20%Ā leftĀ­out,Ā  Ā 3Ā splits 20%Ā leftĀ­out,Ā  Ā 10Ā splits 20%Ā leftĀ­out,Ā  Ā 50Ā splits Ā­16% +14% +4% +33% Ā­15% +13% Ā­8% +8% Ā­15% +12% Ā­10% +11% Ā­13% +10% Ā­8% +8% Ā­12% +10% Ā­7% +7% MEGĀ data Simulations G Varoquaux 16
  • 23. 2 Experiment 2: parameter-tuning Compare diļ¬€erent strategies on validation set: 1. Use the default C = 1 2. Use C = 1000 3. Choose best C by cross-validation and reļ¬t 3. Average best models in cross-validation Validation set Full data Test setTrain set Nested loop Outer loop G Varoquaux 17
  • 24. 2 Experiment 2: parameter-tuning Compare diļ¬€erent strategies on validation set: 1. Use the default C = 1 2. Use C = 1000 3. Choose best C by cross-validation and reļ¬t 3. Average best models in cross-validation Validation set Full data Test setTrain set Nested loop Outer loop Non-sparse decoders SVM 2 Log-reg 2 Sparse decoders SVM 1 Log-reg 1 G Varoquaux 17
  • 25. 2 Cross-validation for tuning? CVĀ +Ā  averaging CVĀ +Ā  refitting C=1 C=1000 Ā­8% Ā­4% Ā­2% 0% +2% +4% +8% ImpactĀ onĀ predictionĀ accuracy SVM logĀ­reg ā‡“ CVĀ +Ā  averaging CVĀ +Ā  refitting C=1 C=1000 Ā­8% Ā­4% Ā­2% 0% +2% +4% +8% ImpactĀ onĀ predictionĀ accuracy SVM logĀ­reg ā‡‘ Non-sparse models Sparse models G Varoquaux 18
  • 26. @GaelVaroquaux Cross-validation: lessons learned Donā€™t use Leave One Out Random 10-20% splits respecting sample structure
  • 27. @GaelVaroquaux Cross-validation: lessons learned Donā€™t use Leave One Out Random 10-20% splits respecting sample structure Cross-validation has error bars of Ā±10%
  • 28. @GaelVaroquaux Cross-validation: lessons learned Donā€™t use Leave One Out Random 10-20% splits respecting sample structure Cross-validation has error bars of Ā±10% Cross-validation is ineļ¬ƒcient for parameter tuning - C = 1 for SVM- 2 - model averaging for SVM- 1
  • 29. @GaelVaroquaux Cross-validation: lessons learned Donā€™t use Leave One Out Random 10-20% splits respecting sample structure Cross-validation has error bars of Ā±10% Cross-validation is ineļ¬ƒcient for parameter tuning - C = 1 for SVM- 2 - model averaging for SVM- 1 https://hal.archives-ouvertes.fr/hal-01332785 ni
  • 30. References I S. Arlot, A. Celisse, ... A survey of cross-validation procedures for model selection. Statistics surveys, 4:40ā€“79, 2010.