This talk describe our efforts to bring easily usable machine learning to brain mapping. It covers both questions that machine learning can answer as well as two softwares developed to facilitate machine learning and it's application to neuroimaging.
5. 1 Predictive models in medical applications
Diagnosis: finding the nature of a disease condition
Pronosis: predicting the evolution
⇒ Therapeutic indications
Early biomarkers: detection before standard symptoms
⇒ Population screening
Quantitative biomarkers: to follow disease progression
⇒ drug development
G Varoquaux 3
6. 1 Predictive models in medical applications
Cannot replace the physician:
Patient history
Therapeutic strategies subject to logistics
...
⇒ No black-box
Segmentation, denoising task
as much as prediction
The why question
G Varoquaux 3
12. [Varoquaux and Thirion 2014] How machine learning is
shaping cognitive neuroimaging
Cognitive neuroimaging and machine learning
Forward Inference
Encoding
Mass-Univariate standard analysis
Reverse Inference
Decoding
Multi-voxel pattern analysis
ICA & linear decompositions
Brain parcellations
Resting state
Descriptions of behavior
Measurements
of brain activity
Brain maps
Cognitive concepts
G Varoquaux 5
13. 1 Decoding: linear models on brain maps
Design
matrix
× Coefficients =
Coefficients are
brain maps
Target
G Varoquaux 6
14. 1 Unsupervised mapping: resting state
Data without labels is cheaper & universal
But often without salient features
(as in rest fMRI)
G Varoquaux 7
20. 2 scikit-learn’s vision: Machine learning for everyone
Outreach
across scientific fields,
applications, communities
Enabling
foster innovation
G Varoquaux 10
21. 2 A Python library
Python
High-level language, for users and developers
General-purpose: suitable for any application
Excellent interactive use
Web searches: Google trends
G Varoquaux 11
22. 2 User base
350 000 returning users 8 000 citations
Employer
Industry Academia
Other
63%
3%
34%
G Varoquaux 12
23. 2 Tradeoffs for outreach
Algorithms and models with good failure mode
Avoid parameters hard to set or fragile convergence
Statistical computing = ill-posed & data-dependent
Didactic documentation
Course on machine learning
Rich examples
G Varoquaux 13
24. 2 Machine learning without learning the machinery
A library, not a program
More expressive and flexible
Easy to include in an ecosystem
G Varoquaux 14
25. 2 Machine learning without learning the machinery
A library, not a program
More expressive and flexible
Easy to include in an ecosystem
Ease of use
Machine learning in new places = innovation
Great API, greats docs
G Varoquaux 14
26. 2 API:
The greybox model
Building bricks
to combine with domain-specific knowledge
interchangeable (mostly)
G Varoquaux 15
27. 2 API:
The greybox model
from s k l e a r n import svm
c l a s s i f i e r = svm.SVC()
c l a s s i f i e r . f i t ( X_train , Y_train )
Y_test = c l a s s i f i e r . p r e d i c t ( X_test )
# or
X_red = c l a s s i f i e r . t r a n s f o r m ( X_test )
Access to the model’s inner parameters
c o e f = c l a s s i f i e r . coef_
G Varoquaux 15
28. 2 Very rich feature set: 160 estimators
Supervised learning
Decision trees (Random-Forest, Boosted Tree)
Linear models SVM
Gaussian processes ...
Unsupervised Learning
Clustering Mixture models
Dictionary learning ICA
Outlier detection ...
Model selection
Cross-validation
Parameter optimization
G Varoquaux 16
29. 2 Models most used in scikit-learn
1. Logistic regression, SVM
2. Random forests
3. PCA
4. Kmeans
5. Naive Bayes
6. Nearest neighbors
From access statistics on the website
G Varoquaux 17
30. More gems in scikit-learn
SAGA:
linear_model.LogisticRegression(solver=’saga’)
Fast linear model on biggish data
G Varoquaux 18
31. More gems in scikit-learn
SAGA:
linear_model.LogisticRegression(solver=’saga’)
Fast linear model on biggish data
PCA == RandomizedPCA: (0.18)
Heuristic to switch PCA to random linear algebra
Fights global warming
Huge speed gains for biggish data
G Varoquaux 18
32. 2 Community-based development in scikit-learn
2010 2012 2014 2016
0
25
50
Huge feature set: benefits of a large team
Monthly contributors
More than 700 contributors
∼ 20 core contributors
https://www.openhub.net/p/scikit-learn
Community-driven project
G Varoquaux 19
33. 2 Scikit-learn-contrib
Scaling the scikit-learn universe quicker
https://github.com/scikit-learn-contrib
py-earth multivariate adaptive regression splines
imbalanced-learn under-sampling and over-sampling
lightning fast linear models
polylearn factorization machines and polynomial networks
hdbscan high-performance clustering
forest-confidence-interval confidence interval for forests
boruta_py boruta feature selection
Much more libraries outside scikit-learn universe
G Varoquaux 20
35. 3 Commoditizing science
Visual image reconstruction from human brain activity
[Miyawaki, et al. (2008)]
“brain reading”
G Varoquaux 22
36. 3 Commoditizing science
Visual image reconstruction from human brain activity
[Miyawaki, et al. (2008)]
Make it work, make it right, make it boring
37. 3 Commoditizing science
Visual image reconstruction from human brain activity
[Miyawaki, et al. (2008)]
Make it work, make it right, make it boring
http://nilearn.github.io/auto_examples/
plot_miyawaki_reconstruction.html
Readable, simple,
reproduction of results
ni
http://nilearn.github.ioG Varoquaux 22
38. 3 Challenges we have to solve
Make using scikit-learn on neuroimaging easy
Getting the data
Struggle for open data
Massaging the data for machine-learning
Very simple signal processing
Documentation
Users do not know what they need
Output + visualization of results
Putting it in application terms
G Varoquaux 23
39. 3 Nilearn in practice
Getting the data
f i l e s = d a t a s e t s . fetch_haxby ()
Caching of the downloads
Resume of partial downloads
G Varoquaux 24
40. 3 Nilearn in practice
Getting the data
f i l e s = d a t a s e t s . fetch_haxby ()
Massaging the data for machine-learning
masker = N i f t i M a s k e r (mask_img=’mask.nii’,
s t a n d a r d i z e = True )
data = masker . f i t _ t r a n s f o r m (’fmri.nii’)
Filenames to data matrix (memory-efficient I/O)
Common preprocessing steps included
G Varoquaux 24
41. 3 Nilearn in practice
Getting the data
f i l e s = d a t a s e t s . fetch_haxby ()
Massaging the data for machine-learning
masker = N i f t i M a s k e r (mask_img=’mask.nii’,
s t a n d a r d i z e = True )
data = masker . f i t _ t r a n s f o r m (’fmri.nii’)
Learning with scikit-learn
e s t i m a t o r . f i t ( data , l a b e l s )
That’s easy!
G Varoquaux 24
42. 3 Nilearn in practice
Getting the data
f i l e s = d a t a s e t s . fetch_haxby ()
Massaging the data for machine-learning
masker = N i f t i M a s k e r (mask_img=’mask.nii’,
s t a n d a r d i z e = True )
data = masker . f i t _ t r a n s f o r m (’fmri.nii’)
Learning with scikit-learn
e s t i m a t o r . f i t ( data , l a b e l s )
Output
plot_stat_map ( masker . i n v e r s e _ t r a n s f o r m (
e s t i m a t o r . weights_ ))
G Varoquaux 24
43. 3 There is more
Learners taylored to statistics of brain maps
Image penalties on linear models
SVM sparse TV- 1
Total-variation penalization
Impose sparsity on the
gradient of the image
In fMRI: [Michel... 2011]
nilearn.github.io/auto_examples/02_decoding/
plot_haxby_space_net.htmlG Varoquaux 25
44. 3 There is more
Learners taylored to statistics of brain maps
Image penalties on linear models
Unsupervised dictionary-learning
Brain regions from rest-fMRI
nilearn.github.io/auto_examples/03_connectivity/
plot_compare_resting_state_decomposition.htmlG Varoquaux 25
45. 3 There is more
Learners taylored to statistics of brain maps
Image penalties on linear models
Unsupervised dictionary-learning
Brain regions from rest-fMRI
Connectome pipeline:
extraction and supervised learning
nilearn.github.io/auto_examples/03_connectivity/
plot_multi_subject_connectome.html
G Varoquaux 25
46. scikit
machine learning in Python
ni
@GaelVaroquaux
Democratisation of machine learning for an application
Generic set of robust algorithms
Foster innovation
47. scikit
machine learning in Python
ni
@GaelVaroquaux
Democratisation of machine learning for an application
Generic set of robust algorithms
For the application: I/O, visualization & open data
Complete, runnable examples
Solve day-to-day problem
Create interest
48. scikit
machine learning in Python
ni
@GaelVaroquaux
Democratisation of machine learning for an application
Generic set of robust algorithms
For the application: I/O, visualization & open data
Documentation, API, ease of installation
Lower the bar
49. scikit
machine learning in Python
ni
@GaelVaroquaux
Democratisation of machine learning for an application
Generic set of robust algorithms
For the application: I/O, visualization & open data
Documentation, API, ease of installation
50. References I
V. Michel, A. Gramfort, G. Varoquaux, E. Eger, and B. Thirion.
Total variation regularization for fMRI-based prediction of
behavior. Medical Imaging, IEEE Transactions on, 30:1328,
2011.
G. Varoquaux and B. Thirion. How machine learning is shaping
cognitive neuroimaging. GigaScience, 3:28, 2014.