From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Simple APIs and innovative documentation
1. Simple APIs and innovative documentation
Reasons for the success of Scientific Python
Emmanuelle Gouillart
joint Unit CNRS/Saint-Gobain SVI
and the scikit-image team
@EGouillart
6. The Scientific Python ecosystem
Integrated
distributions
Signal processing
Specialized modules Visualization
Interpreters
and IDEs
7. NumpPy: Python objects for numerical arrays
Multi-dimensional numerical data container (based on compiled code)
+ utility functions to create/manipulate them
>>> a = np.random. r a n d o m i n t e g e r s (0, 1, (2, 2, 2))
>>> a
a r r a y ([[[0 , 1],
[1, 0]],
[[0, 0],
[0, 1]]])
>>> a. shape , a. dtype
((2, 2, 2), dtype ( ’ int64 ’ ))
x
8. NumpPy: Python objects for numerical arrays
Multi-dimensional numerical data container (based on compiled code)
+ utility functions to create/manipulate them
>>> a = np.random. r a n d o m i n t e g e r s (0, 1, (2, 2, 2))
>>> a
a r r a y ([[[0 , 1],
[1, 0]],
[[0, 0],
[0, 1]]])
>>> a. shape , a. dtype
((2, 2, 2), dtype ( ’ int64 ’ ))
x
Efficient and versatile data access
indexing and slicing fancy indexing
9.
10. What is scikit-image?
An open-source (BSD)
generic image processing library
for the Python language
(and NumPy data arrays)
11. What is scikit-image?
An open-source (BSD)
generic image processing library
for the Python language
(and NumPy data arrays)
for 2D & 3D images
simple API & gentle learning curve
12. A flood of images
several 108 images uploaded on Facebook each day
13. A flood of images
hundreds of terabytes of scientific data for scientific experiment
http://sdo.gsfc.nasa.gov/
15. The people
A quite healthy curve... we can do better!
Fernando Perez & Aaron Meurer, Gist 5843625
16. The people
Origin & diversity
Different fields of application
10 largest contributors: 4 continents
and 7 countries of origin
Where we could do better:
Academic / business / industry
Gender balance
Africa, South America, ...
17.
18. We code when(ever) we can
00:00 06:00 12:00 18:00 24:000
100
200
300
400
500
600 Coding hours
0 200 400 600 800 1000 1200 1400
Number of commits per day
Sun
Sat
Fri
Thu
Wed
Tue
Mon
19. Development model
Mature algorithms
Only Python + Cython code for
easier maintainability
Focus on good practices: testing,
documentation, version control
Hosted on GitHub: thorough code
reivew + continuous integration
Core team of 5 − 10 persons
(close to applications)
22. Who is your typical user?
Windows
54%
Linux
26%
OS X
20%
Not a lot of hardcore geeks
Not a lot of time on her plate
Learning / finding information is hard
23. Manipulating images as numerical (numpy) arrays
Pixels are arrays elements
import numpy as np
image = np. ones ((5, 5))
image [0, 0] = 0
image [2, :] = 0
x
24. Manipulating images as numerical (numpy) arrays
Pixels are arrays elements
import numpy as np
image = np. ones ((5, 5))
image [0, 0] = 0
image [2, :] = 0
x
>>> coffee.shape
(400, 600, 3)
>>> red channel =
coffee[..., 0]
>>> image 3d =
np.ones((100, 100, 100))
25. NumPy-native: images as NumPy arrays
NumPy arrays as arguments and outputs
>>> from skimage import io , f i l t e r s
>>> c am er a a rr ay = i o . imread ( ’ camera image . png ’ )
>>> type( c am er a a rr ay )
<type ’numpy . ndarray ’ >
>>> c am er a a rr ay . dtype
dtype ( ’ uint8 ’ )
>>> f i l t e r e d a r r a y = f i l t e r s . g a u s s i a n ( camera array ,
sigma =5)
>>> type( f i l t e r e d a r r a y )
<type ’numpy . ndarray ’ >
>>> import m a t p l o t l i b . p y p l o t as p l t
>>> p l t .imshow( f i l t e r e d a r r a y , cmap= ’ gray ’ )
x
26. How we simplified the API
Before 2013
>>> from skimage import io , f i l t e r s
>>> c am er a a rr ay = i o . imread ( ’ camera image . png ’ )
>>> type( c am er a a rr ay )
Image ...
>> camera .max()
Image (255, dtype = u i n t 8 )
x
27. Versatile use for 2D, 2D-RGB, 3D...
>>> from skimage import measure
>>> l a b e l s 2 d = measure . l a b e l ( image 2d )
>>> l a b e l s 3 d = measure . l a b e l ( image 3d )
x
28. Versatile use for 2D, 2D-RGB, 3D...
def q u i c k s h i f t (image , r a t i o =1.0, k e r n e l s i z e =5,
max dist =10,
sigma =0, random seed =42):
””” Segments image using q u i c k s h i f t c l u s t e r i n g in
Color −(x , y ) space .
. . .
”””
image = i m g a s f l o a t (np. a t l e a s t 3 d ( image ))
...
x
29. An API relying mostly on functions
skimage . f i l t e r s . g a u s s i a n (image , sigma , output =None,
mode= ’ n ea re st ’ , c v a l =0, m u l t i c h a n n e l =None)
Multi - d i m e n s i o n a l Gaussian filter
Parameters
----------
image : array - l i k e
input image ( g r a y s c a l e or c o l o r ) to filter.
sigma : s c a l a r or sequence of s c a l a r s
standard d e v i a t i o n f o r Gaussian k e r n e l . The
standard d e v i a t i o n s of the
Gaussian filter are g i v e n f o r each a x i s as a
sequence , or as a s i n g l e
number , in which case i t i s equal f o r all axes .
output : array , o p t i o n a l
The ‘‘ output ‘‘ parameter p a s s e s an a r r a y in which
to s t o r e the
filter output .
mode : { ’ r e f l e c t ’ , ’ constant ’ , ’ n ea re st ’ , ’ mirror ’ , ’
wrap ’ }, o p t i o n a l
One filter = one function
Use keyword argument for parameter tuning
39. Example: segmentation of low-constrast regions
In-situ imaging of glass batch reactive melting
Non-local means denoising
to preserve texture
Histogram-based markers
extraction
Random walker
segmentation
Non-local means: average similar patches
Random walker:anisotropic diffusion from markers
Random walker less sensitive to noise than watershed, but slower
42. Feature extraction followed by classification
Combining scikit-image and scikit-learn
Extract features (skimage.feature)
Pixels intensity values (R, G, B)
Local gradients
More advanced descriptors: HOGs, Gabor, ...
Train classifier with known regions
here, random forest classifier
Classify pixels
43. API of scikit-image
skimage
filters restoration segmentation ...
denoise_bilateral
input
array
+ optional
parameters
output
(array)
submodule
module
function
variables
45. What is good documentation?
”Documenting code is like writing ”Tasty!” on the side of a coffee
cup. If the code isn’t readable on a grey Monday morning before
coffee, chuck it out and start again. What you document are APIs
(...). That is fine. Explaining what this funky loop does is not
fine.” Pieter Hintjens
46. Docstrings now and then
docstring in 2008
D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1)
D o c s t r i n g :
C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along
g i v e n a x i s .
x
47. Docstrings now and then
D e f i n i t i o n : np. d i f f (a, n=1, a x i s =-1)
Docstring :
C a l c u l a t e the n- th o r d e r d i s c r e t e d i f f e r e n c e along given a x i s .
The f i r s t o r d e r d i f f e r e n c e i s given by ‘‘ out [n] = a[n+1] - a[n]‘‘ along
the given axis , h i g h e r o r d e r d i f f e r e n c e s are c a l c u l a t e d by using ‘ d i f f ‘
r e c u r s i v e l y .
Parameters
----------
a : a r r a y l i k e
Input a r r a y
n : int , o p t i o n a l
The number of times v a l u e s are d i f f e r e n c e d .
a x i s : int , o p t i o n a l
The a x i s along which the d i f f e r e n c e i s taken , d e f a u l t i s the l a s t
a x i s .
Returns
-------
d i f f : ndarray
The ‘n‘ o r d e r d i f f e r e n c e s . The shape of the output i s the same as ‘a‘
except along ‘ axis ‘ where the dimension i s s m a l l e r by ‘n‘.
See Also
--------
gradient , e d i f f 1 d , cumsum
Examples
--------
>>> x = np. a r r a y ([1, 2, 4, 7, 0])
>>> np. d i f f (x)
a r r a y ([ 1, 2, 3, -7])
>>> np. d i f f (x , n=2)
a r r a y ([ 1, 1, -10])
much better now!
Parameters and their type
Suggestion of other functions
Simple example
48. pydocweb and NumPy documentation Marathon
Tools by Pauli Virtnanen, with enthusiastic cheering from my side
Documentation effort led by St´efan van der Walt
Easy as Wikipedia
A wiki to improve the docs
We didn’t have Github!
49. NumPy documentation standard
https://github.com/numpy/numpy/blob/master/doc/example.py
def foo ( var1 , var2 , long var name = ’ hi ’) :
r”””A one−line summary that does not use variable names or the
function name.
Several sentences providing an extended description . Refer to
variables using back−ticks , e . g . ‘var ‘ .
Parameters
−−−−−−−−−−
var1 : array like
Array like means all those objects −− lists , nested lists , etc . −−
that can be converted to an array . We can also refer to
variables like ‘var1 ‘ .
var2 : int
The type above can either refer to an actual Python type
(e . g . ‘ ‘ int ‘ ‘) , or describe the type of the variable in more
detail , e . g . ‘ ‘(N,) ndarray ‘ ‘ or ‘ ‘ array like ‘ ‘ .
Long variable name : {’ hi ’ , ’ho ’} , optional
Choices in brackets , default f i r s t when optional .
Returns
−−−−−−−
type
Explanation of anonymous return value of type ‘ ‘type ‘ ‘ .
describe : type
Explanation of return value named ‘ describe ‘ .
out : type
Explanation of ‘out ‘ .
Other Parameters
−−−−−−−−−−−−−−−−
only seldom used keywords : type
Explanation
common parameters listed above : type
Explanation
50. Outcome and impact of documentation marathon
# of words in Numpy reference:
8600 → 140,000
New contributors: 250 accounts
Lower entry barrier to contribute
Increased the standard for other
packages
Made people proud about docs
51. Outcome and impact of documentation marathon
# of words in Numpy reference:
8600 → 140,000
New contributors: 250 accounts
Lower entry barrier to contribute
Increased the standard for other
packages
Made people proud about docs
52. From Jake VanderPlas’ blog
https://jakevdp.github.io/blog/2012/09/20/why-python-is-the-last/
61. My first experience of programming...
>>> cd new experiment
>>> a c q u i r e t e m p e r a t u r e ()
>>> name exp = ’ convection ’
>>> c o n t r o l p a r a m e t e r ()
>>> ... and o t h e r magical
s p e l l s
x
62. My first experience of programming...
>>> cd new experiment
>>> a c q u i r e t e m p e r a t u r e ()
>>> name exp = ’ convection ’
>>> c o n t r o l p a r a m e t e r ()
>>> ... and o t h e r magical
s p e l l s
x
63. Euroscipy conferences
Every August: Leipzig, Paris, Brussels, Cambridge
2016 : Erlangen
2 days of tutorials, beginners and advanced
2 days of conference
Help from volunteers always welcome!
64. Scipy lecture notes
Train a lot of people: need tools that scale
Several weeks of tutorials!
Beginners: the core of Scientific Python
Advanced: learn more tricks
Packages: specific applications and packages
Developed and used for Euroscipy conferences
Curated and enriched over the years
68. Achieving a sustainable growth
Balance users’ and contributors’ goals:
robustness and smooth learning curve
vs cool factor and bleeding-edge tools
Feature development should not be
faster than quality improvement
Documentation and training for users
Low entry barriers for contributors
69. Massive data processing and parallelization
Competitive environment: some other tools use
GPUs, Spark, etc. scikit-image uses NumPy!
I/O: large images might not fit into memory
use memory mapping of different file formats (raw
binary with NumPy, hdf5 with pytables).
Divide into blocks: use util.view as blocks to
iterate conveniently over blocks
Parallel processing: use joblib or dask
Better integration desirable
70. Massive data processing and parallelization
Competitive environment: some other tools use
GPUs, Spark, etc. scikit-image uses NumPy!
I/O: large images might not fit into memory
use memory mapping of different file formats (raw
binary with NumPy, hdf5 with pytables).
Divide into blocks: use util.view as blocks to
iterate conveniently over blocks
Parallel processing: use joblib or dask
Better integration desirable
71. joblib: easy simple parallel computing + lazy re-evaluation
>>> from skimage import data
<>> hubble = data . h u b b l e d e e p f i e l d ()
>>> width = 10
>>> p i c s = u t i l . view as windows ( hubble , ( width , hubble
. shape [1], hubble . shape [2]) , s t e p = width )
>>> from j o b l i b import P a r a l l e l , d e l a y e d
>>> # task is an image processing function
>>> P a r a l l e l ( n j o b s =4)( d e l a y e d ( t a s k )( p i c ) f o r p i c in
p i c s )
x
72. A platform to build an ecosystem upon
Tool for users, platform for other tools
$ apt-cache rdepends python-matplotlib
... 96 Python packages & applications
Specific applications that could build on
scikit-image
Imaging techniques; microscopy,
tomography, ...
Fields: cell biology, astronomy, ...
Requirements: stable API, good docs
73.
74. No need to be a programming genius to contribute to OSS
Social and pedagogical skills useful and welcome
You will learn a lot and make friends. P. Hintjens
75. No need to be a programming genius to contribute to OSS
Social and pedagogical skills useful and welcome
You will learn a lot and make friends. P. Hintjens
Try it out! http://scikit-image.org/
Feedback welcome
github.com/scikit-image/scikit-image
Please cite the paper
Let’s talk about scikit-image @EGouillart