SlideShare una empresa de Scribd logo
1 de 98
Descargar para leer sin conexión
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Varoquaux
Ga¨el
Strong is the power
of the dark side
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Varoquaux
Ga¨el
Strong is the power
of the dark side
with Python
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Varoquaux
Ga¨el
Strong is the power
of the dark side
despite good
software?
Publish or perish
Want a career?
G Varoquaux 2
Publish or perish
Want a career?
Broken value system
Only Science matters
Wrong incentives
G Varoquaux 2
Publish or perish
Want a career? Hack academia!
G Varoquaux 2
Publish or perish
Want a career? Hack academia!
G Varoquaux 2
Publish or perish
Want a career? Hack academia!
Publishing scientific software matters
[Pradal, Varoquaux, Langtangen,
J Computational Science]
G Varoquaux 2
[TL;DR]‹
Choose your battles
keeping science in the target
Win them
software production
‹ Too Long, Didn’t Read
G Varoquaux 3
Growing up as a geek scientist
G Varoquaux 4
Growing up as a geek scientist
I did a PhD in
quantum physics
G Varoquaux 5
Growing up as a geek scientist
I did a PhD in
quantum physics
Vacuum (leaks)
Electronics (shorts)
Lasers (mis-alignment)
Best training ever
for agile project
managementG Varoquaux 6
Growing up as a geek scientist
I did a PhD in
quantum physics
Vacuum (leaks)
Electronics (shorts)
Lasers (mis-alignment)
Computers were only one
of the many moving parts
Matlab
Instrument control
G Varoquaux 7
Growing up as a geek scientist
I did a PhD in
quantum physics
Vacuum (leaks)
Electronics (shorts)
Lasers (mis-alignment)
Computers were only one
of the many moving parts
Matlab
Instrument controlShaped my vision
of computing as a
means to an end
G Varoquaux 7
Success
2011
Tenured researcher
in computer science
G Varoquaux 8
Success
2011
Tenured researcher
in computer science
Today
Growing team with
data science
rock stars
G Varoquaux 8
Success
2011
Tenured researcher
in computer science
Today
Growing team with
data science
rock stars
How / why did I switch?
Fernando Perez (IPython), Prabhu
Ramachandran (Mayavi), Eric Jones
(Enthought), Travis Oliphant (Numpy)...
G Varoquaux 8
Success
2011
Tenured researcher
in computer science
Today
Growing team with
data science
rock stars
How / why did I switch?
Fernando Perez (IPython), Prabhu
Ramachandran (Mayavi), Eric Jones
(Enthought), Travis Oliphant (Numpy)...
Learning fast is more impor-
tant than knowing something
Bye bye physics
G Varoquaux 8
And now...
What I do nowadays:
Machine learning to understand brain function
Cognitive neuroscience:
Link neural activity to thoughts and cognition
G Varoquaux 9
And now...
What I do nowadays:
Machine learning to understand brain function
Learn a bilateral link
between brain activity and cognitive function
G Varoquaux 9
Software is the new medium of scientific
method
Galileo’s notes
G Varoquaux 10
The scientific method
1. Make conjectures
2. Derive prediction
3. Carry experiments
4. Confirm or infirm conjectures
G Varoquaux 11
The scientific method
1. Make conjectures
2. Derive prediction
3. Carry experiments
4. Confirm or infirm conjectures
Software is everywhere
Data-mining
Computational models
Computer-controled experiments
Data analysis
G Varoquaux 11
The scientific method
1. Make conjectures
2. Derive prediction
3. Carry experiments
4. Confirm or infirm conjectures
Code is often the very language in
which predictions are expressed
Models are now more complex than a simple
formula or sentence
G Varoquaux 11
Enabling falsification: reproducible science
Replicating
A 3rd party redoing the work
Code and data made available
Reproducing
New analysis on different data / code coming to the
same conclusion
Reusing
Applying the approach to a new problem
Let us enable reusable research
G Varoquaux 12
Enabling falsification: reproducible science
Replicating
A 3rd party redoing the work
Code and data made available
Reproducing
New analysis on different data / code coming to the
same conclusion
Reusing
Applying the approach to a new problem
Let us enable reusable research
Arguments for BSD license
No strings attached
Can tinker with it
G Varoquaux 12
Reusable science ñ evidence accumulation
Accumulation of scientific knowledge
and learning formal representations
G Varoquaux 13
Reusable science ñ evidence accumulation
Accumulation of scientific knowledge
and learning formal representations
Akin to a review paper of the field
But a mathematical model is more testable
Machine learning:
engineering knowledge from data
G Varoquaux 13
Reusable science ñ evidence accumulation
Accumulation of scientific knowledge
and learning formal representations
Akin to a review paper of the field
But a mathematical model is more testable
“A theory is a good theory if it satisfies two requirements:
It must accurately describe a large class of observa-
tions on the basis of a model that contains only a few
arbitrary elements, and it must make definite predic-
tions about the results of future observations.”
Stephen Hawking, A Brief History of Time.
G Varoquaux 13
The sweet spots across science and software
G Varoquaux 14
The advancement of knowledge
Imagine a circle that contains human knowledge
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
By the time you finish elementary school, you know a little
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
High school takes you a little bit further
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
With a bachelors degree, you gain a speciality
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
A master’s degree deepens this speciality
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
Research papers take you to the edge of human knowledge
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
Once you are at the boundary, you focus
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
You push at the boundary for a few years
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
And one day it yields
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
That dent you’ve made, is called a PhD
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
Of course, the world looks different to you now
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
But don’t forget the big picture
PhD
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
This is an optimistic view
Biology
Maths
Computer science
Physics
Economy
Literature
History
G Varoquaux 15
The advancement of knowledge
This is an optimistic view
Biology
Maths
Computer science
Physics
Economy
Literature
History
I want to
be there
G Varoquaux 15
Translationnal computional science
Computational science
The use of computers and mathematical models to
address scientific research
G Varoquaux 16
Translationnal computional science
Computational science
The use of computers and mathematical models to
address scientific research
Translationnal science
In medecine: bring bench science to medical practice
G Varoquaux 16
Translationnal computional science
Computational science
The use of computers and mathematical models to
address scientific research
Translationnal science
In medecine: bring bench science to medical practice
Translational
computational science?
G Varoquaux 16
Pick a problem to work on
Take the “easy” route
There needs to be a market screeming for the
software (in academia and in industry)
Refine your vision
Pull, not push
Design driven be need
G Varoquaux 17
Having an impact
G Varoquaux 18
Having an impact
G Varoquaux 18
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 19
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 19
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Don’t target markets that will not
yield contributors
Need a vision = elevator pitch
G Varoquaux 19
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Don’t target markets that will not
yield contributors
Need a vision = elevator pitch
Your research (PhD) probably does not qualify
ñ need to cherry-pick contributions
G Varoquaux 19
Open source and community development
Code maintenance too expensive to be alone
scikit-learn „ 300 email/month nipy „ 45 email/month
joblib „ 45 email/month mayavi „ 30 email/month
“Hey Gael, I take it you’re too
busy. That’s okay, I spent a day
trying to install XXX and I think
I’ll succeed myself. Next time
though please don’t ignore my
emails, I really don’t like it. You
can say, ‘sorry, I have no time to
help you.’ Just don’t ignore.”
G Varoquaux 20
Open source and community development
Code maintenance too expensive to be alone
scikit-learn „ 300 email/month nipy „ 45 email/month
joblib „ 45 email/month mayavi „ 30 email/month
Your “benefits” come from a fraction of the code
Data loading? Maybe?
Standard algorithms? Nah
Share the common code...
...to avoid dying under code
Code becomes less precious with time
And somebody might contribute features
G Varoquaux 20
Community development in scikit-learn
Huge feature set:
benefits of a large team
Project growth:
More than 200 contributors
„ 12 core contributors
1 full-time INRIA programmer
from the start
Estimated cost of development: $ 6 millions
COCOMO model,
http://www.ohloh.net/p/scikit-learn
G Varoquaux 21
Communities: many eyes makes code fast
L. Buitinck, O. Grisel, A. Joly, G. Louppe, J. Nothman, P. Prettenhofer
G Varoquaux 22
Having an impact
You need a community
G Varoquaux 23
What’s in a scientific-computing environment
G Varoquaux 24
The scientific workflow agile
Interaction...
Ñ script...
Ñ module...
ý interaction again...
Consolidation,
progressively
Low tech and short
turn-around times
G Varoquaux 25
Choose your weapons
Python, what else?
Interactive language
Easy to read / write
General purpose
G Varoquaux 26
Choose your weapons
Python, what else?
Interactive language
Easy to read / write
General purpose
Old virtual machine /
compiler
Younger languages
promissing (Julia)
but will they get
adoption beyond science?
G Varoquaux 26
Choose your weapons
Python, what else?
+Numpy arrays
Shoe-horn your data in a
numpy array, and you’ve won
personnally disappointed
that pandas drifted away
G Varoquaux 26
Software architecture for science
“Scriptability” is paramount
In an application: MVC (model, view, controller)
Model
Numerical or
data-processing
core
View
Ouput: graphs,
or files
Must enable
headless use
Controller
Input: dialogs,
or an API
Avoid input as files:
not expressive
Dialogs should never be far from the code
Dialog generation: traits, IPython widgets
Reactive programming:
dialogs modify object, and the model updates
Don’t own the main
G Varoquaux 27
Software architecture for science
“Scriptability” is paramount
In an application: MVC (model, view, controller)
Model
Numerical or
data-processing
core
View
Ouput: graphs,
or files
Must enable
headless use
Controller
Input: dialogs,
or an API
Avoid input as files:
not expressive
Dialogs should never be far from the code
Dialog generation: traits, IPython widgets
Reactive programming:
dialogs modify object, and the model updates
Don’t own the main
In Mayavi: script generation for free
G Varoquaux 27
Quality is free‹
‹ This is a book, by Philip Crosby
G Varoquaux 28
You need quality
Quality will give you users
Bugs give you bad rap
Quality will give you developers
Contribute to learn and improve
Quality will make your developers happy
People need to be proud of their work
Do less, do better
Goes against the grant-system incentive
G Varoquaux 29
Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Consistency, consistency, consistency
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
G Varoquaux 30
Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Consistency, consistency, consistency
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
Quality enables reuse
Beyond mere reproducibility
G Varoquaux 30
Be productive
G Varoquaux 31
Be productive
“If you spend too much time thinking about a
thing, you’ll never get it done.” — Bruce Lee
G Varoquaux 31
Limited resources
Limited resources are good
Need success in the short term, not the long term
The startup culture: fail fast
Quickly identify non-viable projects
The simpest solution that works is the best
G Varoquaux 32
Short cycles, limited ambitions
Keep coming back to your users
Release early, release often
G Varoquaux 33
Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
ñ 100% increase in code complexity
G Varoquaux 34
Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
ñ 100% increase in code complexity
The 80/20 rule
80% of the usecases can be solved
with 20% of the lines of code
Avoid feature creep
G Varoquaux 34
Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
ñ 100% increase in code complexity
The 80/20 rule
80% of the usecases can be solved
with 20% of the lines of code
Avoid feature creep
Use objects sparingly
Don’t use classes for the sake of it
G Varoquaux 34
Software engineering
G Varoquaux 35
Software engineering good practices
Version control
Use git + github
Unit testing
If it’s not tested, it’s broken or soon will be.
Make a package,
with controlled dependencies and compilation
...
G Varoquaux 36
Research ‰ production
Need to adapt software-engineering principles
Ó
Good naming is free
Use functions, not scripts
Version control is very cheap
Tests are more expensive... Considering if goals
stabilizes
Build chains are hard
Go down the chain as your research progress
You can think of shipping a software only if it was
viable to go completely down the chain
G Varoquaux 37
Things we did right (maybe)
G Varoquaux 38
Mayavi: 3D visualization in Python
Success factors
Building upon VTK Great power
Component model (UI)
Internals open to the world
ñ from interaction to scripting
Limiting factors
Building upon VTK A lot of complexity
Codebase too complex and object-oriented
(bound to VTK)
Users of GUIs do not turn into developers
Composition is an API killer
G Varoquaux 39
Mayavi: 3D visualization in Python
Success factors
Building upon VTK Great power
Component model (UI)
Internals open to the world
ñ from interaction to scripting
Limiting factors
Building upon VTK A lot of complexity
Codebase too complex and object-oriented
(bound to VTK)
Users of GUIs do not turn into developers
Composition is an API killer
G Varoquaux 39
joblib: computational workflow patterns
Parallel for loop
>>> from joblib import Parallel, delayed
>>> Parallel(n jobs=2)(delayed(sqrt)(i**2)
... for i in range(8))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
On-demand dispatch to ease memory consumption
Threading and processes backends
G Varoquaux 40
joblib: computational workflow patterns
Parallel for loop
>>> from joblib import Parallel, delayed
>>> Parallel(n jobs=2)(delayed(sqrt)(i**2)
... for i in range(8))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Memoize pattern
mem = joblib.Memory(cachedir=’.’)
g = mem.cache(f)
b = g(a) # computes a using f
c = g(a) # retrieves results from store
G Varoquaux 40
joblib: computational workflow patterns
Success factors
Simplicity of use
Patterns we really, really need (pull not push)
G Varoquaux 41
joblib: computational workflow patterns
Success factors
Simplicity of use
Patterns we really, really need (pull not push)
Limiting factor
Vision of the project unclear
Positioning with regards to landscape unclear
(IPython, where are you headed?)
Tricky code inside
G Varoquaux 41
scikit-learn: machine learning in Python
Success factors
Right project vision
Machine learning without learning the machinery
Black box that can be opened
Right trade-off between ”just works” and versatility
(think Apple vs Linux)
We’re not going to solve all the problems for you
I don’t solve hard problems
Feature-engineering, domain-specific cases...
Python is a programming language. Use it.
Cover all the 80% usecases in one package
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
- Optimize algorithmes, not for loops
- Know perfectly Numpy and scipy
All significant data should be in arrays
Avoid memory copies, rely on blas/lapack
- Use Cython, quad not C/C++
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
- separate data from operations
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
- separate data from operations
- Object API exposes a data-processing language
fit, predict, transform, score, partial fit
- Instantiated without data but with all parameters
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
Great community
- Github + code review
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
Great community
Great documentation
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
Great community
Great documentation
Limiting factors
Tricky numerical code
Our own success ñ huge volume
G Varoquaux 42
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
It’s about convincing a tenure committee
Code must contribute to a scientific problem
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
2 Not all battles can be fought
Make sure that there is a market
Don’t solve hard problems
Problems that matter for science and industry
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
2 Not all battles can be fought
3 Make good software
That actually answers scientists needs
With quality, software engineering
Relying on a communauty
Usability matters
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
2 Not all battles can be fought
3 Make good software
Now go out, and code!

Más contenido relacionado

Destacado

Destacado (14)

Scientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of dataScientist meets web dev: how Python became the language of data
Scientist meets web dev: how Python became the language of data
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Inter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRIInter-site autism biomarkers from resting state fMRI
Inter-site autism biomarkers from resting state fMRI
 
Machine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questionsMachine learning and cognitive neuroimaging: new tools can answer new questions
Machine learning and cognitive neuroimaging: new tools can answer new questions
 
Brain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizationsBrain maps from machine learning? Spatial regularizations
Brain maps from machine learning? Spatial regularizations
 
A hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstructionA hand-waving introduction to sparsity for compressed tomography reconstruction
A hand-waving introduction to sparsity for compressed tomography reconstruction
 
Advanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysisAdvanced network modelling 2: connectivity measures, goup analysis
Advanced network modelling 2: connectivity measures, goup analysis
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
 
Social-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsitySocial-sparsity brain decoders: faster spatial sparsity
Social-sparsity brain decoders: faster spatial sparsity
 
Connectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis MethodsConnectomics: Parcellations and Network Analysis Methods
Connectomics: Parcellations and Network Analysis Methods
 
Scikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en PythonScikit learn: apprentissage statistique en Python
Scikit learn: apprentissage statistique en Python
 
Brain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in PythonBrain reading, compressive sensing, fMRI and statistical learning in Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
 
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
 

Similar a Succeeding in academia despite doing good_software

Including online labs in science education classrooms
Including online labs in science education classroomsIncluding online labs in science education classrooms
Including online labs in science education classrooms
Go-Lab Initiative
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
dgarijo
 
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
KISK FF MU
 

Similar a Succeeding in academia despite doing good_software (20)

Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...Better neuroimaging data processing: driven by evidence, open communities, an...
Better neuroimaging data processing: driven by evidence, open communities, an...
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable Papers
 
SCC2011 - Talking about e-Science in a virtual world
SCC2011 - Talking about e-Science in a virtual worldSCC2011 - Talking about e-Science in a virtual world
SCC2011 - Talking about e-Science in a virtual world
 
Ocelot (OSS remote Instrumentation)
Ocelot (OSS remote Instrumentation)Ocelot (OSS remote Instrumentation)
Ocelot (OSS remote Instrumentation)
 
Mobile Monday (October 2014) - Riding Global Tech Trends
Mobile Monday (October 2014) - Riding Global Tech TrendsMobile Monday (October 2014) - Riding Global Tech Trends
Mobile Monday (October 2014) - Riding Global Tech Trends
 
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis PlatformImproved Knowledge from Data: Building an Immersive Data Analysis Platform
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
 
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science ChallengeIronHacks Live: Info session #3 - COVID-19 Data Science Challenge
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
 
Increasing immersiveness into a 3D virtual world - motion tracking and natura...
Increasing immersiveness into a 3D virtual world - motion tracking and natura...Increasing immersiveness into a 3D virtual world - motion tracking and natura...
Increasing immersiveness into a 3D virtual world - motion tracking and natura...
 
Including online labs in science education classrooms
Including online labs in science education classroomsIncluding online labs in science education classrooms
Including online labs in science education classrooms
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
ProteicJS: the next-generation visualization library
ProteicJS: the next-generation visualization libraryProteicJS: the next-generation visualization library
ProteicJS: the next-generation visualization library
 
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
Enhancing and Transforming Global Learning Communities with Augmented Reality
Enhancing and Transforming Global Learning Communities with Augmented RealityEnhancing and Transforming Global Learning Communities with Augmented Reality
Enhancing and Transforming Global Learning Communities with Augmented Reality
 
Research2.0
Research2.0Research2.0
Research2.0
 
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
 
Dr. Álvaro Alonso - FrugalAI-UPM.pptx
Dr. Álvaro Alonso - FrugalAI-UPM.pptxDr. Álvaro Alonso - FrugalAI-UPM.pptx
Dr. Álvaro Alonso - FrugalAI-UPM.pptx
 

Más de Gael Varoquaux

Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
Gael Varoquaux
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Gael Varoquaux
 

Más de Gael Varoquaux (13)

Evaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic valueEvaluating machine learning models and their diagnostic value
Evaluating machine learning models and their diagnostic value
 
Measuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imagingMeasuring mental health with machine learning and brain imaging
Measuring mental health with machine learning and brain imaging
 
Machine learning with missing values
Machine learning with missing valuesMachine learning with missing values
Machine learning with missing values
 
Dirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated dataDirty data science machine learning on non-curated data
Dirty data science machine learning on non-curated data
 
Representation learning in limited-data settings
Representation learning in limited-data settingsRepresentation learning in limited-data settings
Representation learning in limited-data settings
 
Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?Functional-connectome biomarkers to meet clinical needs?
Functional-connectome biomarkers to meet clinical needs?
 
Atlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mappingAtlases of cognition with large-scale human brain mapping
Atlases of cognition with large-scale human brain mapping
 
Similarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variablesSimilarity encoding for learning on dirty categorical variables
Similarity encoding for learning on dirty categorical variables
 
Machine learning for functional connectomes
Machine learning for functional connectomesMachine learning for functional connectomes
Machine learning for functional connectomes
 
Towards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imagingTowards psychoinformatics with machine learning and brain imaging
Towards psychoinformatics with machine learning and brain imaging
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
A tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imagingA tutorial on Machine Learning, with illustrations for MR imaging
A tutorial on Machine Learning, with illustrations for MR imaging
 
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and LimitationsEstimating Functional Connectomes: Sparsity’s Strength and Limitations
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
 

Último

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 

Succeeding in academia despite doing good_software

  • 1. Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Varoquaux Ga¨el Strong is the power of the dark side
  • 2. Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Varoquaux Ga¨el Strong is the power of the dark side with Python
  • 3. Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Varoquaux Ga¨el Strong is the power of the dark side despite good software?
  • 4. Publish or perish Want a career? G Varoquaux 2
  • 5. Publish or perish Want a career? Broken value system Only Science matters Wrong incentives G Varoquaux 2
  • 6. Publish or perish Want a career? Hack academia! G Varoquaux 2
  • 7. Publish or perish Want a career? Hack academia! G Varoquaux 2
  • 8. Publish or perish Want a career? Hack academia! Publishing scientific software matters [Pradal, Varoquaux, Langtangen, J Computational Science] G Varoquaux 2
  • 9. [TL;DR]‹ Choose your battles keeping science in the target Win them software production ‹ Too Long, Didn’t Read G Varoquaux 3
  • 10. Growing up as a geek scientist G Varoquaux 4
  • 11. Growing up as a geek scientist I did a PhD in quantum physics G Varoquaux 5
  • 12. Growing up as a geek scientist I did a PhD in quantum physics Vacuum (leaks) Electronics (shorts) Lasers (mis-alignment) Best training ever for agile project managementG Varoquaux 6
  • 13. Growing up as a geek scientist I did a PhD in quantum physics Vacuum (leaks) Electronics (shorts) Lasers (mis-alignment) Computers were only one of the many moving parts Matlab Instrument control G Varoquaux 7
  • 14. Growing up as a geek scientist I did a PhD in quantum physics Vacuum (leaks) Electronics (shorts) Lasers (mis-alignment) Computers were only one of the many moving parts Matlab Instrument controlShaped my vision of computing as a means to an end G Varoquaux 7
  • 16. Success 2011 Tenured researcher in computer science Today Growing team with data science rock stars G Varoquaux 8
  • 17. Success 2011 Tenured researcher in computer science Today Growing team with data science rock stars How / why did I switch? Fernando Perez (IPython), Prabhu Ramachandran (Mayavi), Eric Jones (Enthought), Travis Oliphant (Numpy)... G Varoquaux 8
  • 18. Success 2011 Tenured researcher in computer science Today Growing team with data science rock stars How / why did I switch? Fernando Perez (IPython), Prabhu Ramachandran (Mayavi), Eric Jones (Enthought), Travis Oliphant (Numpy)... Learning fast is more impor- tant than knowing something Bye bye physics G Varoquaux 8
  • 19. And now... What I do nowadays: Machine learning to understand brain function Cognitive neuroscience: Link neural activity to thoughts and cognition G Varoquaux 9
  • 20. And now... What I do nowadays: Machine learning to understand brain function Learn a bilateral link between brain activity and cognitive function G Varoquaux 9
  • 21. Software is the new medium of scientific method Galileo’s notes G Varoquaux 10
  • 22. The scientific method 1. Make conjectures 2. Derive prediction 3. Carry experiments 4. Confirm or infirm conjectures G Varoquaux 11
  • 23. The scientific method 1. Make conjectures 2. Derive prediction 3. Carry experiments 4. Confirm or infirm conjectures Software is everywhere Data-mining Computational models Computer-controled experiments Data analysis G Varoquaux 11
  • 24. The scientific method 1. Make conjectures 2. Derive prediction 3. Carry experiments 4. Confirm or infirm conjectures Code is often the very language in which predictions are expressed Models are now more complex than a simple formula or sentence G Varoquaux 11
  • 25. Enabling falsification: reproducible science Replicating A 3rd party redoing the work Code and data made available Reproducing New analysis on different data / code coming to the same conclusion Reusing Applying the approach to a new problem Let us enable reusable research G Varoquaux 12
  • 26. Enabling falsification: reproducible science Replicating A 3rd party redoing the work Code and data made available Reproducing New analysis on different data / code coming to the same conclusion Reusing Applying the approach to a new problem Let us enable reusable research Arguments for BSD license No strings attached Can tinker with it G Varoquaux 12
  • 27. Reusable science ñ evidence accumulation Accumulation of scientific knowledge and learning formal representations G Varoquaux 13
  • 28. Reusable science ñ evidence accumulation Accumulation of scientific knowledge and learning formal representations Akin to a review paper of the field But a mathematical model is more testable Machine learning: engineering knowledge from data G Varoquaux 13
  • 29. Reusable science ñ evidence accumulation Accumulation of scientific knowledge and learning formal representations Akin to a review paper of the field But a mathematical model is more testable “A theory is a good theory if it satisfies two requirements: It must accurately describe a large class of observa- tions on the basis of a model that contains only a few arbitrary elements, and it must make definite predic- tions about the results of future observations.” Stephen Hawking, A Brief History of Time. G Varoquaux 13
  • 30. The sweet spots across science and software G Varoquaux 14
  • 31. The advancement of knowledge Imagine a circle that contains human knowledge Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 32. The advancement of knowledge By the time you finish elementary school, you know a little Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 33. The advancement of knowledge High school takes you a little bit further Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 34. The advancement of knowledge With a bachelors degree, you gain a speciality Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 35. The advancement of knowledge A master’s degree deepens this speciality Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 36. The advancement of knowledge Research papers take you to the edge of human knowledge Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 37. The advancement of knowledge Once you are at the boundary, you focus Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 38. The advancement of knowledge You push at the boundary for a few years Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 39. The advancement of knowledge And one day it yields Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 40. The advancement of knowledge That dent you’ve made, is called a PhD Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 41. The advancement of knowledge Of course, the world looks different to you now Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 42. The advancement of knowledge But don’t forget the big picture PhD Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 43. The advancement of knowledge This is an optimistic view Biology Maths Computer science Physics Economy Literature History G Varoquaux 15
  • 44. The advancement of knowledge This is an optimistic view Biology Maths Computer science Physics Economy Literature History I want to be there G Varoquaux 15
  • 45. Translationnal computional science Computational science The use of computers and mathematical models to address scientific research G Varoquaux 16
  • 46. Translationnal computional science Computational science The use of computers and mathematical models to address scientific research Translationnal science In medecine: bring bench science to medical practice G Varoquaux 16
  • 47. Translationnal computional science Computational science The use of computers and mathematical models to address scientific research Translationnal science In medecine: bring bench science to medical practice Translational computational science? G Varoquaux 16
  • 48. Pick a problem to work on Take the “easy” route There needs to be a market screeming for the software (in academia and in industry) Refine your vision Pull, not push Design driven be need G Varoquaux 17
  • 49. Having an impact G Varoquaux 18
  • 50. Having an impact G Varoquaux 18
  • 51. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting G Varoquaux 19
  • 52. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting G Varoquaux 19
  • 53. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting Define project scope and vision Break down projects by expertise Don’t solve hard problems Know the software landscape Don’t target markets that will not yield contributors Need a vision = elevator pitch G Varoquaux 19
  • 54. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting Define project scope and vision Break down projects by expertise Don’t solve hard problems Know the software landscape Don’t target markets that will not yield contributors Need a vision = elevator pitch Your research (PhD) probably does not qualify ñ need to cherry-pick contributions G Varoquaux 19
  • 55. Open source and community development Code maintenance too expensive to be alone scikit-learn „ 300 email/month nipy „ 45 email/month joblib „ 45 email/month mayavi „ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” G Varoquaux 20
  • 56. Open source and community development Code maintenance too expensive to be alone scikit-learn „ 300 email/month nipy „ 45 email/month joblib „ 45 email/month mayavi „ 30 email/month Your “benefits” come from a fraction of the code Data loading? Maybe? Standard algorithms? Nah Share the common code... ...to avoid dying under code Code becomes less precious with time And somebody might contribute features G Varoquaux 20
  • 57. Community development in scikit-learn Huge feature set: benefits of a large team Project growth: More than 200 contributors „ 12 core contributors 1 full-time INRIA programmer from the start Estimated cost of development: $ 6 millions COCOMO model, http://www.ohloh.net/p/scikit-learn G Varoquaux 21
  • 58. Communities: many eyes makes code fast L. Buitinck, O. Grisel, A. Joly, G. Louppe, J. Nothman, P. Prettenhofer G Varoquaux 22
  • 59. Having an impact You need a community G Varoquaux 23
  • 60. What’s in a scientific-computing environment G Varoquaux 24
  • 61. The scientific workflow agile Interaction... Ñ script... Ñ module... ý interaction again... Consolidation, progressively Low tech and short turn-around times G Varoquaux 25
  • 62. Choose your weapons Python, what else? Interactive language Easy to read / write General purpose G Varoquaux 26
  • 63. Choose your weapons Python, what else? Interactive language Easy to read / write General purpose Old virtual machine / compiler Younger languages promissing (Julia) but will they get adoption beyond science? G Varoquaux 26
  • 64. Choose your weapons Python, what else? +Numpy arrays Shoe-horn your data in a numpy array, and you’ve won personnally disappointed that pandas drifted away G Varoquaux 26
  • 65. Software architecture for science “Scriptability” is paramount In an application: MVC (model, view, controller) Model Numerical or data-processing core View Ouput: graphs, or files Must enable headless use Controller Input: dialogs, or an API Avoid input as files: not expressive Dialogs should never be far from the code Dialog generation: traits, IPython widgets Reactive programming: dialogs modify object, and the model updates Don’t own the main G Varoquaux 27
  • 66. Software architecture for science “Scriptability” is paramount In an application: MVC (model, view, controller) Model Numerical or data-processing core View Ouput: graphs, or files Must enable headless use Controller Input: dialogs, or an API Avoid input as files: not expressive Dialogs should never be far from the code Dialog generation: traits, IPython widgets Reactive programming: dialogs modify object, and the model updates Don’t own the main In Mayavi: script generation for free G Varoquaux 27
  • 67. Quality is free‹ ‹ This is a book, by Philip Crosby G Varoquaux 28
  • 68. You need quality Quality will give you users Bugs give you bad rap Quality will give you developers Contribute to learn and improve Quality will make your developers happy People need to be proud of their work Do less, do better Goes against the grant-system incentive G Varoquaux 29
  • 69. Quality: what & how Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test G Varoquaux 30
  • 70. Quality: what & how Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test Quality enables reuse Beyond mere reproducibility G Varoquaux 30
  • 72. Be productive “If you spend too much time thinking about a thing, you’ll never get it done.” — Bruce Lee G Varoquaux 31
  • 73. Limited resources Limited resources are good Need success in the short term, not the long term The startup culture: fail fast Quickly identify non-viable projects The simpest solution that works is the best G Varoquaux 32
  • 74. Short cycles, limited ambitions Keep coming back to your users Release early, release often G Varoquaux 33
  • 75. Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code complexity G Varoquaux 34
  • 76. Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code complexity The 80/20 rule 80% of the usecases can be solved with 20% of the lines of code Avoid feature creep G Varoquaux 34
  • 77. Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code complexity The 80/20 rule 80% of the usecases can be solved with 20% of the lines of code Avoid feature creep Use objects sparingly Don’t use classes for the sake of it G Varoquaux 34
  • 79. Software engineering good practices Version control Use git + github Unit testing If it’s not tested, it’s broken or soon will be. Make a package, with controlled dependencies and compilation ... G Varoquaux 36
  • 80. Research ‰ production Need to adapt software-engineering principles Ó Good naming is free Use functions, not scripts Version control is very cheap Tests are more expensive... Considering if goals stabilizes Build chains are hard Go down the chain as your research progress You can think of shipping a software only if it was viable to go completely down the chain G Varoquaux 37
  • 81. Things we did right (maybe) G Varoquaux 38
  • 82. Mayavi: 3D visualization in Python Success factors Building upon VTK Great power Component model (UI) Internals open to the world ñ from interaction to scripting Limiting factors Building upon VTK A lot of complexity Codebase too complex and object-oriented (bound to VTK) Users of GUIs do not turn into developers Composition is an API killer G Varoquaux 39
  • 83. Mayavi: 3D visualization in Python Success factors Building upon VTK Great power Component model (UI) Internals open to the world ñ from interaction to scripting Limiting factors Building upon VTK A lot of complexity Codebase too complex and object-oriented (bound to VTK) Users of GUIs do not turn into developers Composition is an API killer G Varoquaux 39
  • 84. joblib: computational workflow patterns Parallel for loop >>> from joblib import Parallel, delayed >>> Parallel(n jobs=2)(delayed(sqrt)(i**2) ... for i in range(8)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0] On-demand dispatch to ease memory consumption Threading and processes backends G Varoquaux 40
  • 85. joblib: computational workflow patterns Parallel for loop >>> from joblib import Parallel, delayed >>> Parallel(n jobs=2)(delayed(sqrt)(i**2) ... for i in range(8)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0] Memoize pattern mem = joblib.Memory(cachedir=’.’) g = mem.cache(f) b = g(a) # computes a using f c = g(a) # retrieves results from store G Varoquaux 40
  • 86. joblib: computational workflow patterns Success factors Simplicity of use Patterns we really, really need (pull not push) G Varoquaux 41
  • 87. joblib: computational workflow patterns Success factors Simplicity of use Patterns we really, really need (pull not push) Limiting factor Vision of the project unclear Positioning with regards to landscape unclear (IPython, where are you headed?) Tricky code inside G Varoquaux 41
  • 88. scikit-learn: machine learning in Python Success factors Right project vision Machine learning without learning the machinery Black box that can be opened Right trade-off between ”just works” and versatility (think Apple vs Linux) We’re not going to solve all the problems for you I don’t solve hard problems Feature-engineering, domain-specific cases... Python is a programming language. Use it. Cover all the 80% usecases in one package G Varoquaux 42
  • 89. scikit-learn: machine learning in Python Success factors Right project vision High-level programming - Optimize algorithmes, not for loops - Know perfectly Numpy and scipy All significant data should be in arrays Avoid memory copies, rely on blas/lapack - Use Cython, quad not C/C++ G Varoquaux 42
  • 90. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design - separate data from operations 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0G Varoquaux 42
  • 91. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design - separate data from operations - Object API exposes a data-processing language fit, predict, transform, score, partial fit - Instantiated without data but with all parameters G Varoquaux 42
  • 92. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design Great community - Github + code review G Varoquaux 42
  • 93. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design Great community Great documentation G Varoquaux 42
  • 94. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design Great community Great documentation Limiting factors Tricky numerical code Our own success ñ huge volume G Varoquaux 42
  • 95. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system It’s about convincing a tenure committee Code must contribute to a scientific problem
  • 96. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system 2 Not all battles can be fought Make sure that there is a market Don’t solve hard problems Problems that matter for science and industry
  • 97. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system 2 Not all battles can be fought 3 Make good software That actually answers scientists needs With quality, software engineering Relying on a communauty Usability matters
  • 98. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system 2 Not all battles can be fought 3 Make good software Now go out, and code!