SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Reproducible ML:
software challenges, anecdotes and
some engineering solutions 
Alexandre Gramfort
http://alexandre.gramfort.net
GitHub : @agramfort Twitter : @agramfort
FreeSurfer: popular software for extracting features from MRI
(e.g. cortical thickness used to predict Alzheimer’s disease, etc.)
https://surfer.nmr.mgh.harvard.edu/
FreeSurfer: popular software for extracting features from MRI
(e.g. cortical thickness used to predict Alzheimer’s disease, etc.)
https://surfer.nmr.mgh.harvard.edu/
FreeSurfer: popular software for extracting features from MRI
(e.g. cortical thickness used to predict Alzheimer’s disease, etc.)
Hardware and software
differences can lead to different
features / statistical results and
scientific conclusions
https://surfer.nmr.mgh.harvard.edu/
https://github.com/mne-tools/mne-python/issues/4922
ICA: popular matrix factorization problem. Infomax does an SGD on the
non-convex log-likelihood function
https://github.com/mne-tools/mne-python/issues/4922
ICA: popular matrix factorization problem. Infomax does an SGD on the
non-convex log-likelihood function
Changing BLAS/Lapack
backends changes results
Even changing OMP_NUM_THREADS
can change the results
https://github.com/scikit-learn/scikit-learn/issues/5545
https://github.com/scikit-learn/scikit-learn/issues/5545
Even on the same machine numerical
solvers can lead to different outcomes
A. Gramfort - HdR - Bridging gaps between neuroimaging, ML and optimization
Some software engineering solutions
for reproducible ML
http://scikit-learn.org
http://scikit-learn.org
526,000 users in the last 30 days
42,000,000 pages views in last year
http://scikit-learn.org
526,000 users in the last 30 days
42,000,000 pages views in last year
Big user base
higher chance of
spotting issues
Alex Gramfort Reproducible ML: challenges and some engineering solutions 
Do not reinvent the wheel…
7
#JSM2016Jake VanderPlas
We provide one
component in the
Python ecosystem
Alex Gramfort Reproducible ML: challenges and some engineering solutions 
Do not reinvent the wheel…
7
#JSM2016Jake VanderPlas
We provide one
component in the
Python ecosystem
Code reuse and
tight community
Bigger user base
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/tests/test_pca.py
Unit test
Docstring tests
alex@:scikit-learn(master)$ cloc --not-match-d='tests' --force-lang=Python sklearn/
427 text files.
426 unique files.
29 files ignored.
[…]
———————————————————————————————————————
Language files blank comment code
-------------------------------------------------------------------------------
Python 426 83679 395769 552905
-------------------------------------------------------------------------------
alex@:scikit-learn(master)$ cloc --match-d='tests' --force-lang=Python sklearn/
168 text files.
168 unique files.
25 files ignored.
[…]
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 168 13153 7014 45710
-------------------------------------------------------------------------------
Not even counting docstrings….
45,000 lines of test!
Continuous integration on all platforms
Win/OSX/Linux
from Py 2.7 to 3.7
Coverage
Simplifying code reuse with sphinx-gallery
sphinx-gallery:
Write doc by writing
Python code
Sphinx-Gallery
https://sphinx-gallery.readthedocs.io
Extracted from scikit-learn
and funded by:
Sphinx-Gallery
Sphinx-Gallery
https://mybinder.org/
Simplifying code reuse with sphinx-gallery
Sphinx-Gallery
https://sphinx-gallery.readthedocs.io
Configuring sphinx-gallery is really easy:
How to go even further?
Open Data
http://jaberg.github.io/skdata/
http://www.dmi.usherb.ca/~larocheh/mlpython/
Some oldies…
Many more…
https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community
Open Data
https://www.nih.gov/news-events/news-releases/nih-clinical-center-provides-one-largest-publicly-available-chest-x-ray-datasets-scientific-community
Open Data
Open Data
Open Data
https://mlperf.org/
But DATA isn’t much…
… without evaluation platforms
https://mlperf.org/ Reproducible
benchmarks
https://www.ramp.studio/
https://www.ramp.studio/
RAMP: Challenge
with code
submission
https://paris-saclay-cds.github.io/autism_challenge/
Reproducible
challenges
https://www.ramp.studio/
https://www.ramp.studio/
Allows to:
• Run code on private data
• Pick model with good accuracy/perf tradeoff
So in the end maybe we
can easily do better?
Alex Gramfort Reproducible ML: challenges and some engineering solutions 
Wrapping up
24
• Even hardware/software replication is hard and costly
Alex Gramfort Reproducible ML: challenges and some engineering solutions 
Wrapping up
24
• Even hardware/software replication is hard and costly
• Disclaimer: Not every problem has an engineering solution
Alex Gramfort Reproducible ML: challenges and some engineering solutions 
Wrapping up
24
• Even hardware/software replication is hard and costly
Sphinx-Gallery
• Yet, technology and engineering can make ML more replicable
• Modern science is Open Science
• Disclaimer: Not every problem has an engineering solution
Alexandre Gramfort
http://alexandre.gramfort.netContact:
GitHub : @agramfort Twitter : @agramfort
"An approximate answer to the right problem is worth a good deal more than an exact
answer to an approximate problem. ~ JohnTukey"
Support:

Más contenido relacionado

Similar a ICML 2018 Reproducible Machine Learning - A. Gramfort

Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
 
SANS @Night There's Gold in Them Thar Package Management Databases
SANS @Night There's Gold in Them Thar Package Management DatabasesSANS @Night There's Gold in Them Thar Package Management Databases
SANS @Night There's Gold in Them Thar Package Management Databases
Phil Hagen
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
Ajay Ohri
 
20131015_demo_oshk
20131015_demo_oshk20131015_demo_oshk
20131015_demo_oshk
Jeff Yang
 
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers LibraryAdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
jamieayre
 

Similar a ICML 2018 Reproducible Machine Learning - A. Gramfort (20)

re-frame à la spec
re-frame à la specre-frame à la spec
re-frame à la spec
 
Crossing into Kernel Space
Crossing into Kernel SpaceCrossing into Kernel Space
Crossing into Kernel Space
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
hacking-embedded-devices.pptx
hacking-embedded-devices.pptxhacking-embedded-devices.pptx
hacking-embedded-devices.pptx
 
2. introduction
2. introduction2. introduction
2. introduction
 
SANS @Night There's Gold in Them Thar Package Management Databases
SANS @Night There's Gold in Them Thar Package Management DatabasesSANS @Night There's Gold in Them Thar Package Management Databases
SANS @Night There's Gold in Them Thar Package Management Databases
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)RR & Docker @ MuensteR Meetup (Sep 2017)
RR & Docker @ MuensteR Meetup (Sep 2017)
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
4Developers 2018: Pyt(h)on vs słoń: aktualny stan przetwarzania dużych danych...
 
20131015_demo_oshk
20131015_demo_oshk20131015_demo_oshk
20131015_demo_oshk
 
Installing spark 2
Installing spark 2Installing spark 2
Installing spark 2
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame GraphsJavaOne 2015 Java Mixed-Mode Flame Graphs
JavaOne 2015 Java Mixed-Mode Flame Graphs
 
Os Wilhelm
Os WilhelmOs Wilhelm
Os Wilhelm
 
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers LibraryAdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
AdaCore Paris Tech Day 2016: Fabien Chouteau - Making the Ada Drivers Library
 
E bpf and dynamic tracing for mariadb db as (mariadb day during fosdem 2020)
E bpf and dynamic tracing for mariadb db as (mariadb day during fosdem 2020)E bpf and dynamic tracing for mariadb db as (mariadb day during fosdem 2020)
E bpf and dynamic tracing for mariadb db as (mariadb day during fosdem 2020)
 
Hadoop spark online demo
Hadoop spark online demoHadoop spark online demo
Hadoop spark online demo
 
dot15926 Software Presentation
dot15926 Software Presentationdot15926 Software Presentation
dot15926 Software Presentation
 
Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...
Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...
Developing Applications for Beagle Bone Black, Raspberry Pi and SoC Single Bo...
 
161208
161208161208
161208
 

Más de agramfort (7)

MNE sapien labs 2019
MNE sapien labs 2019MNE sapien labs 2019
MNE sapien labs 2019
 
MAIN Conf Talk: Learning representations from neural signals
MAIN Conf Talk: Learning representations from neural signalsMAIN Conf Talk: Learning representations from neural signals
MAIN Conf Talk: Learning representations from neural signals
 
SfN 2018: Machine learning and signal processing for neural oscillations
SfN 2018: Machine learning and signal processing for neural oscillationsSfN 2018: Machine learning and signal processing for neural oscillations
SfN 2018: Machine learning and signal processing for neural oscillations
 
MNE group analysis presentation @ Biomag 2016 conf.
MNE group analysis presentation @ Biomag 2016 conf.MNE group analysis presentation @ Biomag 2016 conf.
MNE group analysis presentation @ Biomag 2016 conf.
 
Teaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTechTeaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTech
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learn
 
Paris machine learning meetup 17 Sept. 2013
Paris machine learning meetup 17 Sept. 2013Paris machine learning meetup 17 Sept. 2013
Paris machine learning meetup 17 Sept. 2013
 

Último

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Último (20)

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 

ICML 2018 Reproducible Machine Learning - A. Gramfort