SlideShare a Scribd company logo
1 of 75
Download to read offline
Simple APIs and innovative documentation
Reasons for the success of Scientific Python
Emmanuelle Gouillart
joint Unit CNRS/Saint-Gobain SVI
and the scikit-image team
@EGouillart
Outline
Ecosystem
People and teams
Simple APIs
Hacking the documentation
Challenges for the future
Python: a huge success in education
Engineering departments are still lagging behind.
Adoption by traditional institutions
Data science
The Scientific Python ecosystem
Integrated
distributions
Signal processing
Specialized modules Visualization
Interpreters
and IDEs
NumpPy: Python objects for numerical arrays
Multi-dimensional numerical data container (based on compiled code)
+ utility functions to create/manipulate them
>>> a = np.random. r a n d o m i n t e g e r s (0, 1, (2, 2, 2))
>>> a
a r r a y ([[[0 , 1],
[1, 0]],
[[0, 0],
[0, 1]]])
>>> a. shape , a. dtype
((2, 2, 2), dtype ( ’ int64 ’ ))
x
NumpPy: Python objects for numerical arrays
Multi-dimensional numerical data container (based on compiled code)
+ utility functions to create/manipulate them
>>> a = np.random. r a n d o m i n t e g e r s (0, 1, (2, 2, 2))
>>> a
a r r a y ([[[0 , 1],
[1, 0]],
[[0, 0],
[0, 1]]])
>>> a. shape , a. dtype
((2, 2, 2), dtype ( ’ int64 ’ ))
x
Efficient and versatile data access
indexing and slicing fancy indexing
What is scikit-image?
An open-source (BSD)
generic image processing library
for the Python language
(and NumPy data arrays)
What is scikit-image?
An open-source (BSD)
generic image processing library
for the Python language
(and NumPy data arrays)
for 2D & 3D images
simple API & gentle learning curve
A flood of images
several 108 images uploaded on Facebook each day
A flood of images
hundreds of terabytes of scientific data for scientific experiment
http://sdo.gsfc.nasa.gov/
Datasheet
Package statistics
http://scikit-image.org/
Release 0.12 (1 - 2 release per year)
Among 1000 best ranked packages on
PyPi
20000 unique visitors / month
The people
A quite healthy curve... we can do better!
Fernando Perez & Aaron Meurer, Gist 5843625
The people
Origin & diversity
Different fields of application
10 largest contributors: 4 continents
and 7 countries of origin
Where we could do better:
Academic / business / industry
Gender balance
Africa, South America, ...
We code when(ever) we can
00:00 06:00 12:00 18:00 24:000
100
200
300
400
500
600 Coding hours
0 200 400 600 800 1000 1200 1400
Number of commits per day
Sun
Sat
Fri
Thu
Wed
Tue
Mon
Development model
Mature algorithms
Only Python + Cython code for
easier maintainability
Focus on good practices: testing,
documentation, version control
Hosted on GitHub: thorough code
reivew + continuous integration
Core team of 5 − 10 persons
(close to applications)
Who is your typical user?
Who is your typical user?
Who is your typical user?
Windows
54%
Linux
26%
OS X
20%
Not a lot of hardcore geeks
Not a lot of time on her plate
Learning / finding information is hard
Manipulating images as numerical (numpy) arrays
Pixels are arrays elements
import numpy as np
image = np. ones ((5, 5))
image [0, 0] = 0
image [2, :] = 0
x
Manipulating images as numerical (numpy) arrays
Pixels are arrays elements
import numpy as np
image = np. ones ((5, 5))
image [0, 0] = 0
image [2, :] = 0
x
>>> coffee.shape
(400, 600, 3)
>>> red channel =
coffee[..., 0]
>>> image 3d =
np.ones((100, 100, 100))
NumPy-native: images as NumPy arrays
NumPy arrays as arguments and outputs
>>> from skimage import io , f i l t e r s
>>> c am er a a rr ay = i o . imread ( ’ camera image . png ’ )
>>> type( c am er a a rr ay )
<type ’numpy . ndarray ’ >
>>> c am er a a rr ay . dtype
dtype ( ’ uint8 ’ )
>>> f i l t e r e d a r r a y = f i l t e r s . g a u s s i a n ( camera array ,
sigma =5)
>>> type( f i l t e r e d a r r a y )
<type ’numpy . ndarray ’ >
>>> import m a t p l o t l i b . p y p l o t as p l t
>>> p l t .imshow( f i l t e r e d a r r a y , cmap= ’ gray ’ )
x
How we simplified the API
Before 2013
>>> from skimage import io , f i l t e r s
>>> c am er a a rr ay = i o . imread ( ’ camera image . png ’ )
>>> type( c am er a a rr ay )
Image ...
>> camera .max()
Image (255, dtype = u i n t 8 )
x
Versatile use for 2D, 2D-RGB, 3D...
>>> from skimage import measure
>>> l a b e l s 2 d = measure . l a b e l ( image 2d )
>>> l a b e l s 3 d = measure . l a b e l ( image 3d )
x
Versatile use for 2D, 2D-RGB, 3D...
def q u i c k s h i f t (image , r a t i o =1.0, k e r n e l s i z e =5,
max dist =10,
sigma =0, random seed =42):
””” Segments image using q u i c k s h i f t c l u s t e r i n g in
Color −(x , y ) space .
. . .
”””
image = i m g a s f l o a t (np. a t l e a s t 3 d ( image ))
...
x
An API relying mostly on functions
skimage . f i l t e r s . g a u s s i a n (image , sigma , output =None,
mode= ’ n ea re st ’ , c v a l =0, m u l t i c h a n n e l =None)
Multi - d i m e n s i o n a l Gaussian filter
Parameters
----------
image : array - l i k e
input image ( g r a y s c a l e or c o l o r ) to filter.
sigma : s c a l a r or sequence of s c a l a r s
standard d e v i a t i o n f o r Gaussian k e r n e l . The
standard d e v i a t i o n s of the
Gaussian filter are g i v e n f o r each a x i s as a
sequence , or as a s i n g l e
number , in which case i t i s equal f o r all axes .
output : array , o p t i o n a l
The ‘‘ output ‘‘ parameter p a s s e s an a r r a y in which
to s t o r e the
filter output .
mode : { ’ r e f l e c t ’ , ’ constant ’ , ’ n ea re st ’ , ’ mirror ’ , ’
wrap ’ }, o p t i o n a l
One filter = one function
Use keyword argument for parameter tuning
API of scikit-learn
Filtering: transforming image data
skimage.filters, skimage.exposure,
skimage.restoration
In situ study of phase separation
Denoising tomography images
In-situ imaging of phase separation
in silicate melts
From basic (generic) to advanced
(specific) filters
Denoising tomography images
Histogram of pixel values
From basic (generic) to advanced
(specific) filters
bilateral = restoration . denoise bilateral (dat)
bilateral = restoration . denoise bilateral (dat, sigma range=2.5,
sigma spatial=2)
tv = restoration . denoise tv chambolle (dat, weight=0.5)
Converging to a coherent API
Segmentation: labelling regions
skimage.segmentation
Example: segmentation of low-constrast regions
In-situ imaging of glass batch reactive melting
Non-local means denoising
to preserve texture
Histogram-based markers
extraction
Random walker
segmentation
Non-local means: average similar patches
Random walker:anisotropic diffusion from markers
Random walker less sensitive to noise than watershed, but slower
Visualizing the geometry of reactions
Quantifying the reacted parts of the grain
Extracting features
skimage.feature, skimage.filters
Feature extraction followed by classification
Combining scikit-image and scikit-learn
Extract features (skimage.feature)
Pixels intensity values (R, G, B)
Local gradients
More advanced descriptors: HOGs, Gabor, ...
Train classifier with known regions
here, random forest classifier
Classify pixels
API of scikit-image
skimage
filters restoration segmentation ...
denoise_bilateral
input
array
+ optional
parameters
output
(array)
submodule
module
function
variables
Documentation and teaching
What is good documentation?
”Documenting code is like writing ”Tasty!” on the side of a coffee
cup. If the code isn’t readable on a grey Monday morning before
coffee, chuck it out and start again. What you document are APIs
(...). That is fine. Explaining what this funky loop does is not
fine.” Pieter Hintjens
Docstrings now and then
docstring in 2008
D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1)
D o c s t r i n g :
C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along
g i v e n a x i s .
x
Docstrings now and then
D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1)
Docstring :
C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along given a x i s .
The f i r s t o r d e r d i f f e r e n c e i s given by ‘‘ out [n] = a[n+1] - a[n]‘‘ along
the given axis , h i g h e r o r d e r d i f f e r e n c e s are c a l c u l a t e d by using ‘ d i f f ‘
r e c u r s i v e l y .
Parameters
----------
a : a r r a y l i k e
Input a r r a y
n : int , o p t i o n a l
The number of times v a l u e s are d i f f e r e n c e d .
a x i s : int , o p t i o n a l
The a x i s along which the d i f f e r e n c e i s taken , d e f a u l t i s the l a s t
a x i s .
Returns
-------
d i f f : ndarray
The ‘n‘ o r d e r d i f f e r e n c e s . The shape of the output i s the same as ‘a‘
except along ‘ axis ‘ where the dimension i s s m a l l e r by ‘n‘.
See Also
--------
gradient , e d i f f 1 d , cumsum
Examples
--------
>>> x = np. a r r a y ([1, 2, 4, 7, 0])
>>> np. d i f f (x)
a r r a y ([ 1, 2, 3, -7])
>>> np. d i f f (x , n=2)
a r r a y ([ 1, 1, -10])
much better now!
Parameters and their type
Suggestion of other functions
Simple example
pydocweb and NumPy documentation Marathon
Tools by Pauli Virtnanen, with enthusiastic cheering from my side
Documentation effort led by St´efan van der Walt
Easy as Wikipedia
A wiki to improve the docs
We didn’t have Github!
NumPy documentation standard
https://github.com/numpy/numpy/blob/master/doc/example.py
def foo ( var1 , var2 , long var name = ’ hi ’) :
r”””A one−line summary that does not use variable names or the
function name.
Several sentences providing an extended description . Refer to
variables using back−ticks , e . g . ‘var ‘ .
Parameters
−−−−−−−−−−
var1 : array like
Array like means all those objects −− lists , nested lists , etc . −−
that can be converted to an array . We can also refer to
variables like ‘var1 ‘ .
var2 : int
The type above can either refer to an actual Python type
(e . g . ‘ ‘ int ‘ ‘) , or describe the type of the variable in more
detail , e . g . ‘ ‘(N,) ndarray ‘ ‘ or ‘ ‘ array like ‘ ‘ .
Long variable name : {’ hi ’ , ’ho ’} , optional
Choices in brackets , default f i r s t when optional .
Returns
−−−−−−−
type
Explanation of anonymous return value of type ‘ ‘type ‘ ‘ .
describe : type
Explanation of return value named ‘ describe ‘ .
out : type
Explanation of ‘out ‘ .
Other Parameters
−−−−−−−−−−−−−−−−
only seldom used keywords : type
Explanation
common parameters listed above : type
Explanation
Outcome and impact of documentation marathon
# of words in Numpy reference:
8600 → 140,000
New contributors: 250 accounts
Lower entry barrier to contribute
Increased the standard for other
packages
Made people proud about docs
Outcome and impact of documentation marathon
# of words in Numpy reference:
8600 → 140,000
New contributors: 250 accounts
Lower entry barrier to contribute
Increased the standard for other
packages
Made people proud about docs
From Jake VanderPlas’ blog
https://jakevdp.github.io/blog/2012/09/20/why-python-is-the-last/
Documentation at a glance: galleries of examples
Documentation at a glance: galleries of examples
Documentation at a glance: galleries of examples
Getting started: finding documentation
Umbrella project: sphinx-gallery
Auto documenting your API with links to examples
Auto documenting your API with links to examples
My first experience of programming...
My first experience of programming...
>>> cd new experiment
>>> a c q u i r e t e m p e r a t u r e ()
>>> name exp = ’ convection ’
>>> c o n t r o l p a r a m e t e r ()
>>> ... and o t h e r magical
s p e l l s
x
My first experience of programming...
>>> cd new experiment
>>> a c q u i r e t e m p e r a t u r e ()
>>> name exp = ’ convection ’
>>> c o n t r o l p a r a m e t e r ()
>>> ... and o t h e r magical
s p e l l s
x
Euroscipy conferences
Every August: Leipzig, Paris, Brussels, Cambridge
2016 : Erlangen
2 days of tutorials, beginners and advanced
2 days of conference
Help from volunteers always welcome!
Scipy lecture notes
Train a lot of people: need tools that scale
Several weeks of tutorials!
Beginners: the core of Scientific Python
Advanced: learn more tricks
Packages: specific applications and packages
Developed and used for Euroscipy conferences
Curated and enriched over the years
Towards a more interactive documentation?
Learning by yourself
Auto threshold of ImageJ
Challenges for the future
Achieving a sustainable growth
Balance users’ and contributors’ goals:
robustness and smooth learning curve
vs cool factor and bleeding-edge tools
Feature development should not be
faster than quality improvement
Documentation and training for users
Low entry barriers for contributors
Massive data processing and parallelization
Competitive environment: some other tools use
GPUs, Spark, etc. scikit-image uses NumPy!
I/O: large images might not fit into memory
use memory mapping of different file formats (raw
binary with NumPy, hdf5 with pytables).
Divide into blocks: use util.view as blocks to
iterate conveniently over blocks
Parallel processing: use joblib or dask
Better integration desirable
Massive data processing and parallelization
Competitive environment: some other tools use
GPUs, Spark, etc. scikit-image uses NumPy!
I/O: large images might not fit into memory
use memory mapping of different file formats (raw
binary with NumPy, hdf5 with pytables).
Divide into blocks: use util.view as blocks to
iterate conveniently over blocks
Parallel processing: use joblib or dask
Better integration desirable
joblib: easy simple parallel computing + lazy re-evaluation
>>> from skimage import data
<>> hubble = data . h u b b l e d e e p f i e l d ()
>>> width = 10
>>> p i c s = u t i l . view as windows ( hubble , ( width , hubble
. shape [1], hubble . shape [2]) , s t e p = width )
>>> from j o b l i b import P a r a l l e l , d e l a y e d
>>> # task is an image processing function
>>> P a r a l l e l ( n j o b s =4)( d e l a y e d ( t a s k )( p i c ) f o r p i c in
p i c s )
x
A platform to build an ecosystem upon
Tool for users, platform for other tools
$ apt-cache rdepends python-matplotlib
... 96 Python packages & applications
Specific applications that could build on
scikit-image
Imaging techniques; microscopy,
tomography, ...
Fields: cell biology, astronomy, ...
Requirements: stable API, good docs
No need to be a programming genius to contribute to OSS
Social and pedagogical skills useful and welcome
You will learn a lot and make friends. P. Hintjens
No need to be a programming genius to contribute to OSS
Social and pedagogical skills useful and welcome
You will learn a lot and make friends. P. Hintjens
Try it out! http://scikit-image.org/
Feedback welcome
github.com/scikit-image/scikit-image
Please cite the paper
Let’s talk about scikit-image @EGouillart

More Related Content

What's hot

Advanced python
Advanced pythonAdvanced python
Advanced python
EU Edge
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
DataWorks Summit
 

What's hot (20)

Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
A Speculative Technique for Auto-Memoization Processor with Multithreading
A Speculative Technique for Auto-Memoization Processor with MultithreadingA Speculative Technique for Auto-Memoization Processor with Multithreading
A Speculative Technique for Auto-Memoization Processor with Multithreading
 
662305 10
662305 10662305 10
662305 10
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in Python
 
Python 培训讲义
Python 培训讲义Python 培训讲义
Python 培训讲义
 
Advanced python
Advanced pythonAdvanced python
Advanced python
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
Real world cats
Real world catsReal world cats
Real world cats
 
[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務
 
Advance java
Advance javaAdvance java
Advance java
 
COMPUTER GRAPHICS LAB MANUAL
COMPUTER GRAPHICS LAB MANUALCOMPUTER GRAPHICS LAB MANUAL
COMPUTER GRAPHICS LAB MANUAL
 
SDC - Einführung in Scala
SDC - Einführung in ScalaSDC - Einführung in Scala
SDC - Einführung in Scala
 
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook
Python na Infraestrutura 
MySQL do Facebook

Python na Infraestrutura 
MySQL do Facebook

 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Deep Learning and TensorFlow
Deep Learning and TensorFlowDeep Learning and TensorFlow
Deep Learning and TensorFlow
 
Python 표준 라이브러리
Python 표준 라이브러리Python 표준 라이브러리
Python 표준 라이브러리
 
SE Computer, Programming Laboratory(210251) University of Pune
SE Computer, Programming Laboratory(210251) University of PuneSE Computer, Programming Laboratory(210251) University of Pune
SE Computer, Programming Laboratory(210251) University of Pune
 
Odessapy2013 - Graph databases and Python
Odessapy2013 - Graph databases and PythonOdessapy2013 - Graph databases and Python
Odessapy2013 - Graph databases and Python
 
Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpy
 

Viewers also liked

Viewers also liked (18)

Collect pydata from your processes
Collect pydata from your processesCollect pydata from your processes
Collect pydata from your processes
 
Extracting and analyzing online confessions
Extracting and analyzing online confessionsExtracting and analyzing online confessions
Extracting and analyzing online confessions
 
Python to report in one command
Python to report in one commandPython to report in one command
Python to report in one command
 
Automatic Machine Learning
Automatic Machine LearningAutomatic Machine Learning
Automatic Machine Learning
 
LO3
LO3LO3
LO3
 
Scikit-learn for text mining at Jurismarchés
Scikit-learn for text mining at JurismarchésScikit-learn for text mining at Jurismarchés
Scikit-learn for text mining at Jurismarchés
 
Lo1
Lo1Lo1
Lo1
 
Lightning large scale machine learning in python
Lightning  large scale machine learning in pythonLightning  large scale machine learning in python
Lightning large scale machine learning in python
 
Statistical Entity Linking
Statistical Entity LinkingStatistical Entity Linking
Statistical Entity Linking
 
Eamanitech Pvt Ltd - One stop solution for IT Services in India
Eamanitech Pvt Ltd - One stop solution for IT Services in IndiaEamanitech Pvt Ltd - One stop solution for IT Services in India
Eamanitech Pvt Ltd - One stop solution for IT Services in India
 
Wendelin : From Stock Movements to Pivot Tables Inside Jupyter
Wendelin : From Stock Movements to Pivot Tables Inside JupyterWendelin : From Stock Movements to Pivot Tables Inside Jupyter
Wendelin : From Stock Movements to Pivot Tables Inside Jupyter
 
Prespective analytics with DOcplex and pandas
Prespective analytics with DOcplex and pandasPrespective analytics with DOcplex and pandas
Prespective analytics with DOcplex and pandas
 
Unit 15 - LO1 Poster
Unit 15 - LO1 PosterUnit 15 - LO1 Poster
Unit 15 - LO1 Poster
 
UNIT 35 - LO2
UNIT 35 - LO2UNIT 35 - LO2
UNIT 35 - LO2
 
Witness Statement
Witness StatementWitness Statement
Witness Statement
 
Proposal
ProposalProposal
Proposal
 
Lo3
Lo3Lo3
Lo3
 
Pitch LO4
Pitch LO4Pitch LO4
Pitch LO4
 

Similar to Simple APIs and innovative documentation

Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Olivier Grisel
 
London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09
bwhitcher
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Positive Hack Days
 

Similar to Simple APIs and innovative documentation (20)

A Map of the PyData Stack
A Map of the PyData StackA Map of the PyData Stack
A Map of the PyData Stack
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
C3 w2
C3 w2C3 w2
C3 w2
 
A CTF Hackers Toolbox
A CTF Hackers ToolboxA CTF Hackers Toolbox
A CTF Hackers Toolbox
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09London useR Meeting 21-Jul-09
London useR Meeting 21-Jul-09
 
Numpy Talk at SIAM
Numpy Talk at SIAMNumpy Talk at SIAM
Numpy Talk at SIAM
 
Angular and Deep Learning
Angular and Deep LearningAngular and Deep Learning
Angular and Deep Learning
 
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Python for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo CruzPython for Scientific Computing -- Ricardo Cruz
Python for Scientific Computing -- Ricardo Cruz
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Simple APIs and innovative documentation

  • 1. Simple APIs and innovative documentation Reasons for the success of Scientific Python Emmanuelle Gouillart joint Unit CNRS/Saint-Gobain SVI and the scikit-image team @EGouillart
  • 2. Outline Ecosystem People and teams Simple APIs Hacking the documentation Challenges for the future
  • 3. Python: a huge success in education Engineering departments are still lagging behind.
  • 4. Adoption by traditional institutions
  • 6. The Scientific Python ecosystem Integrated distributions Signal processing Specialized modules Visualization Interpreters and IDEs
  • 7. NumpPy: Python objects for numerical arrays Multi-dimensional numerical data container (based on compiled code) + utility functions to create/manipulate them >>> a = np.random. r a n d o m i n t e g e r s (0, 1, (2, 2, 2)) >>> a a r r a y ([[[0 , 1], [1, 0]], [[0, 0], [0, 1]]]) >>> a. shape , a. dtype ((2, 2, 2), dtype ( ’ int64 ’ )) x
  • 8. NumpPy: Python objects for numerical arrays Multi-dimensional numerical data container (based on compiled code) + utility functions to create/manipulate them >>> a = np.random. r a n d o m i n t e g e r s (0, 1, (2, 2, 2)) >>> a a r r a y ([[[0 , 1], [1, 0]], [[0, 0], [0, 1]]]) >>> a. shape , a. dtype ((2, 2, 2), dtype ( ’ int64 ’ )) x Efficient and versatile data access indexing and slicing fancy indexing
  • 9.
  • 10. What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays)
  • 11. What is scikit-image? An open-source (BSD) generic image processing library for the Python language (and NumPy data arrays) for 2D & 3D images simple API & gentle learning curve
  • 12. A flood of images several 108 images uploaded on Facebook each day
  • 13. A flood of images hundreds of terabytes of scientific data for scientific experiment http://sdo.gsfc.nasa.gov/
  • 14. Datasheet Package statistics http://scikit-image.org/ Release 0.12 (1 - 2 release per year) Among 1000 best ranked packages on PyPi 20000 unique visitors / month
  • 15. The people A quite healthy curve... we can do better! Fernando Perez & Aaron Meurer, Gist 5843625
  • 16. The people Origin & diversity Different fields of application 10 largest contributors: 4 continents and 7 countries of origin Where we could do better: Academic / business / industry Gender balance Africa, South America, ...
  • 17.
  • 18. We code when(ever) we can 00:00 06:00 12:00 18:00 24:000 100 200 300 400 500 600 Coding hours 0 200 400 600 800 1000 1200 1400 Number of commits per day Sun Sat Fri Thu Wed Tue Mon
  • 19. Development model Mature algorithms Only Python + Cython code for easier maintainability Focus on good practices: testing, documentation, version control Hosted on GitHub: thorough code reivew + continuous integration Core team of 5 − 10 persons (close to applications)
  • 20. Who is your typical user?
  • 21. Who is your typical user?
  • 22. Who is your typical user? Windows 54% Linux 26% OS X 20% Not a lot of hardcore geeks Not a lot of time on her plate Learning / finding information is hard
  • 23. Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np. ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x
  • 24. Manipulating images as numerical (numpy) arrays Pixels are arrays elements import numpy as np image = np. ones ((5, 5)) image [0, 0] = 0 image [2, :] = 0 x >>> coffee.shape (400, 600, 3) >>> red channel = coffee[..., 0] >>> image 3d = np.ones((100, 100, 100))
  • 25. NumPy-native: images as NumPy arrays NumPy arrays as arguments and outputs >>> from skimage import io , f i l t e r s >>> c am er a a rr ay = i o . imread ( ’ camera image . png ’ ) >>> type( c am er a a rr ay ) <type ’numpy . ndarray ’ > >>> c am er a a rr ay . dtype dtype ( ’ uint8 ’ ) >>> f i l t e r e d a r r a y = f i l t e r s . g a u s s i a n ( camera array , sigma =5) >>> type( f i l t e r e d a r r a y ) <type ’numpy . ndarray ’ > >>> import m a t p l o t l i b . p y p l o t as p l t >>> p l t .imshow( f i l t e r e d a r r a y , cmap= ’ gray ’ ) x
  • 26. How we simplified the API Before 2013 >>> from skimage import io , f i l t e r s >>> c am er a a rr ay = i o . imread ( ’ camera image . png ’ ) >>> type( c am er a a rr ay ) Image ... >> camera .max() Image (255, dtype = u i n t 8 ) x
  • 27. Versatile use for 2D, 2D-RGB, 3D... >>> from skimage import measure >>> l a b e l s 2 d = measure . l a b e l ( image 2d ) >>> l a b e l s 3 d = measure . l a b e l ( image 3d ) x
  • 28. Versatile use for 2D, 2D-RGB, 3D... def q u i c k s h i f t (image , r a t i o =1.0, k e r n e l s i z e =5, max dist =10, sigma =0, random seed =42): ””” Segments image using q u i c k s h i f t c l u s t e r i n g in Color −(x , y ) space . . . . ””” image = i m g a s f l o a t (np. a t l e a s t 3 d ( image )) ... x
  • 29. An API relying mostly on functions skimage . f i l t e r s . g a u s s i a n (image , sigma , output =None, mode= ’ n ea re st ’ , c v a l =0, m u l t i c h a n n e l =None) Multi - d i m e n s i o n a l Gaussian filter Parameters ---------- image : array - l i k e input image ( g r a y s c a l e or c o l o r ) to filter. sigma : s c a l a r or sequence of s c a l a r s standard d e v i a t i o n f o r Gaussian k e r n e l . The standard d e v i a t i o n s of the Gaussian filter are g i v e n f o r each a x i s as a sequence , or as a s i n g l e number , in which case i t i s equal f o r all axes . output : array , o p t i o n a l The ‘‘ output ‘‘ parameter p a s s e s an a r r a y in which to s t o r e the filter output . mode : { ’ r e f l e c t ’ , ’ constant ’ , ’ n ea re st ’ , ’ mirror ’ , ’ wrap ’ }, o p t i o n a l One filter = one function Use keyword argument for parameter tuning
  • 30.
  • 32. Filtering: transforming image data skimage.filters, skimage.exposure, skimage.restoration
  • 33. In situ study of phase separation
  • 34.
  • 35. Denoising tomography images In-situ imaging of phase separation in silicate melts From basic (generic) to advanced (specific) filters
  • 36. Denoising tomography images Histogram of pixel values From basic (generic) to advanced (specific) filters bilateral = restoration . denoise bilateral (dat) bilateral = restoration . denoise bilateral (dat, sigma range=2.5, sigma spatial=2) tv = restoration . denoise tv chambolle (dat, weight=0.5)
  • 37. Converging to a coherent API
  • 39. Example: segmentation of low-constrast regions In-situ imaging of glass batch reactive melting Non-local means denoising to preserve texture Histogram-based markers extraction Random walker segmentation Non-local means: average similar patches Random walker:anisotropic diffusion from markers Random walker less sensitive to noise than watershed, but slower
  • 40. Visualizing the geometry of reactions Quantifying the reacted parts of the grain
  • 42. Feature extraction followed by classification Combining scikit-image and scikit-learn Extract features (skimage.feature) Pixels intensity values (R, G, B) Local gradients More advanced descriptors: HOGs, Gabor, ... Train classifier with known regions here, random forest classifier Classify pixels
  • 43. API of scikit-image skimage filters restoration segmentation ... denoise_bilateral input array + optional parameters output (array) submodule module function variables
  • 45. What is good documentation? ”Documenting code is like writing ”Tasty!” on the side of a coffee cup. If the code isn’t readable on a grey Monday morning before coffee, chuck it out and start again. What you document are APIs (...). That is fine. Explaining what this funky loop does is not fine.” Pieter Hintjens
  • 46. Docstrings now and then docstring in 2008 D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1) D o c s t r i n g : C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along g i v e n a x i s . x
  • 47. Docstrings now and then D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1) Docstring : C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along given a x i s . The f i r s t o r d e r d i f f e r e n c e i s given by ‘‘ out [n] = a[n+1] - a[n]‘‘ along the given axis , h i g h e r o r d e r d i f f e r e n c e s are c a l c u l a t e d by using ‘ d i f f ‘ r e c u r s i v e l y . Parameters ---------- a : a r r a y l i k e Input a r r a y n : int , o p t i o n a l The number of times v a l u e s are d i f f e r e n c e d . a x i s : int , o p t i o n a l The a x i s along which the d i f f e r e n c e i s taken , d e f a u l t i s the l a s t a x i s . Returns ------- d i f f : ndarray The ‘n‘ o r d e r d i f f e r e n c e s . The shape of the output i s the same as ‘a‘ except along ‘ axis ‘ where the dimension i s s m a l l e r by ‘n‘. See Also -------- gradient , e d i f f 1 d , cumsum Examples -------- >>> x = np. a r r a y ([1, 2, 4, 7, 0]) >>> np. d i f f (x) a r r a y ([ 1, 2, 3, -7]) >>> np. d i f f (x , n=2) a r r a y ([ 1, 1, -10]) much better now! Parameters and their type Suggestion of other functions Simple example
  • 48. pydocweb and NumPy documentation Marathon Tools by Pauli Virtnanen, with enthusiastic cheering from my side Documentation effort led by St´efan van der Walt Easy as Wikipedia A wiki to improve the docs We didn’t have Github!
  • 49. NumPy documentation standard https://github.com/numpy/numpy/blob/master/doc/example.py def foo ( var1 , var2 , long var name = ’ hi ’) : r”””A one−line summary that does not use variable names or the function name. Several sentences providing an extended description . Refer to variables using back−ticks , e . g . ‘var ‘ . Parameters −−−−−−−−−− var1 : array like Array like means all those objects −− lists , nested lists , etc . −− that can be converted to an array . We can also refer to variables like ‘var1 ‘ . var2 : int The type above can either refer to an actual Python type (e . g . ‘ ‘ int ‘ ‘) , or describe the type of the variable in more detail , e . g . ‘ ‘(N,) ndarray ‘ ‘ or ‘ ‘ array like ‘ ‘ . Long variable name : {’ hi ’ , ’ho ’} , optional Choices in brackets , default f i r s t when optional . Returns −−−−−−− type Explanation of anonymous return value of type ‘ ‘type ‘ ‘ . describe : type Explanation of return value named ‘ describe ‘ . out : type Explanation of ‘out ‘ . Other Parameters −−−−−−−−−−−−−−−− only seldom used keywords : type Explanation common parameters listed above : type Explanation
  • 50. Outcome and impact of documentation marathon # of words in Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs
  • 51. Outcome and impact of documentation marathon # of words in Numpy reference: 8600 → 140,000 New contributors: 250 accounts Lower entry barrier to contribute Increased the standard for other packages Made people proud about docs
  • 52. From Jake VanderPlas’ blog https://jakevdp.github.io/blog/2012/09/20/why-python-is-the-last/
  • 53. Documentation at a glance: galleries of examples
  • 54. Documentation at a glance: galleries of examples
  • 55. Documentation at a glance: galleries of examples
  • 56. Getting started: finding documentation
  • 58. Auto documenting your API with links to examples
  • 59. Auto documenting your API with links to examples
  • 60. My first experience of programming...
  • 61. My first experience of programming... >>> cd new experiment >>> a c q u i r e t e m p e r a t u r e () >>> name exp = ’ convection ’ >>> c o n t r o l p a r a m e t e r () >>> ... and o t h e r magical s p e l l s x
  • 62. My first experience of programming... >>> cd new experiment >>> a c q u i r e t e m p e r a t u r e () >>> name exp = ’ convection ’ >>> c o n t r o l p a r a m e t e r () >>> ... and o t h e r magical s p e l l s x
  • 63. Euroscipy conferences Every August: Leipzig, Paris, Brussels, Cambridge 2016 : Erlangen 2 days of tutorials, beginners and advanced 2 days of conference Help from volunteers always welcome!
  • 64. Scipy lecture notes Train a lot of people: need tools that scale Several weeks of tutorials! Beginners: the core of Scientific Python Advanced: learn more tricks Packages: specific applications and packages Developed and used for Euroscipy conferences Curated and enriched over the years
  • 65. Towards a more interactive documentation?
  • 66. Learning by yourself Auto threshold of ImageJ
  • 68. Achieving a sustainable growth Balance users’ and contributors’ goals: robustness and smooth learning curve vs cool factor and bleeding-edge tools Feature development should not be faster than quality improvement Documentation and training for users Low entry barriers for contributors
  • 69. Massive data processing and parallelization Competitive environment: some other tools use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable
  • 70. Massive data processing and parallelization Competitive environment: some other tools use GPUs, Spark, etc. scikit-image uses NumPy! I/O: large images might not fit into memory use memory mapping of different file formats (raw binary with NumPy, hdf5 with pytables). Divide into blocks: use util.view as blocks to iterate conveniently over blocks Parallel processing: use joblib or dask Better integration desirable
  • 71. joblib: easy simple parallel computing + lazy re-evaluation >>> from skimage import data <>> hubble = data . h u b b l e d e e p f i e l d () >>> width = 10 >>> p i c s = u t i l . view as windows ( hubble , ( width , hubble . shape [1], hubble . shape [2]) , s t e p = width ) >>> from j o b l i b import P a r a l l e l , d e l a y e d >>> # task is an image processing function >>> P a r a l l e l ( n j o b s =4)( d e l a y e d ( t a s k )( p i c ) f o r p i c in p i c s ) x
  • 72. A platform to build an ecosystem upon Tool for users, platform for other tools $ apt-cache rdepends python-matplotlib ... 96 Python packages & applications Specific applications that could build on scikit-image Imaging techniques; microscopy, tomography, ... Fields: cell biology, astronomy, ... Requirements: stable API, good docs
  • 73.
  • 74. No need to be a programming genius to contribute to OSS Social and pedagogical skills useful and welcome You will learn a lot and make friends. P. Hintjens
  • 75. No need to be a programming genius to contribute to OSS Social and pedagogical skills useful and welcome You will learn a lot and make friends. P. Hintjens Try it out! http://scikit-image.org/ Feedback welcome github.com/scikit-image/scikit-image Please cite the paper Let’s talk about scikit-image @EGouillart