2. JobLib
Set of tools to provide lightweight pipelining in Python
http://packages.python.org/joblib/
easy_install joblib
3. JobLib
Avoid computing twice the same thing
>>> from joblib import Memory
>>> mem = Memory(cachedir='/tmp/joblib')
>>> import numpy as np
>>> a = np.vander(np.arange(3))
>>> square = mem.cache(np.square)
>>> b = square(a)
________________________________________________________________________________
[Memory] Calling square...
square(array([[0, 0, 1],
[1, 1, 1],
[4, 2, 1]]))
___________________________________________________________square - 0.0s, 0.0min
>>> c = square(a)
>>> # The above call did not trigger an evaluation
Memoize pattern with fast disk-caching
4. JobLib
UseCases
>>> import numpy as np
>>> @memory.cache
... def g(x):
... print 'A long-running calculation, with parameter', x
... return np.hamming(x)
>>> @memory.cache
... def h(x):
... print 'A second long-running calculation, using g(x)'
... return np.vander(x)
>>> a = g(3)
A long-running calculation, with parameter 3
>>> a
array([ 0.08, 1. , 0.08])
>>> g(3)
array([ 0.08, 1. , 0.08])
>>> b = h(a)
A second long-running calculation, using g(x)
>>> b2 = h(a)
>>> b2
array([[ 0.0064, 0.08 , 1. ],
[ 1. , 1. , 1.
[ 0.0064, 0.08 , 1.
],
]])
Numpy arrays Support!
>>> np.allclose(b, b2)
True
5. JobLib
Benchmarks - Fibonacci
>>>In [3]: timeit normal_fib(30)
100 loops, best of 3: 576 ms per loop
Após cache ...
>>>In [9]: timeit fib(30)
1000 loops, best of 3: 262 us per loop
6. JobLib
Transparent parallelization using multiprocessing module
Before
>>> from math import sqrt
>>> [sqrt(i**2) for i in range(10)]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
After
>>> from math import sqrt
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2, verbose=1)(delayed(sqrt)(i**2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
7. JobLib
Running Python function as pipeline jobs
http://packages.python.org/joblib/index.html
Marcel Caraciolo, @marcelcaraciolo