Joblib: Lightweight pipelining for parallel jobs (v2)

Marcel Caraciolo
@marcelcaraciolo
CTO Genomika Diagnósticos,
DataGeek, Machine learning, Python, Data and brainssss!
joblib,
Running Python functions as pipeline jobs

The problem
Take the advantage of more than one core or more than one CPU by default .
Special for problems that can solved or optimized by parallel computing
Matrix Multiplication,
A web server that is answering small requests
(like static files). Have a worker process each
request.
Web crawler that is following all links on a
website, spin off a thread for each link.
Batch processing images

For starters
Embarassingly parallel problem: Sum all of the primes in a range of
integers starting from 100,000 and going to 5,000,000.
http://pt.wikipedia.org/wiki/Crivo_de_Erat%C3%B3stenes

For beginners
Module threading in Python
Easy to start, native in Python, generally the most selected choice.
Share the memory and state of the parent
Light weight
Each gets its own stack
Do not use Inter-process communication
Good for: Adding throughput and reduce latency

For beginners
But: Python only allows a single thread to be
executing within the interpreter at once. This
restriction is enforced by the GIL.
To put it into a real world analogy: imagine 100
developers working at a company with only a
single coffee mug. Most of the developers would
spend their time waiting for coffee instead of
coding.
GIL: “Global Interpreter Lock” - this is a lock which
must be acquired for a thread to enter the interpreter’s
space.
Only one thread may be executing within the Python
interpreter at once.

For advanced
Module Multiprocessing in Python
Not well known by the developers, specially because of
process creation can be sluggish: create the workers up
front.
Follows the threading API closely but uses Processes and
inter-process communication under the hood
Also offers distributed-computing faculties as well.
Allows the side-stepping of the GIL for CPU bound
applications.
Allows for data/memory sharing.

For advanced
This gets around the GIL limitation, but obviously has more overhead.
In addition, communicating between processes is not as easy as reading
and writing shared memory.
!
Python multiprocessing, on the other hand, uses multiple system level
processes, that is, it starts up multiple instances of the Python interpreter.

Benchmarks
All results are in wall-clock time.
• Single Threaded: 41 minutes, 57 seconds
• Multi Threaded (8 threads): 106 minutes, 29 seconds
• MultiProcessing (8 Processes): 6 minutes, 22 seconds
http://nathangrigg.net/2015/04/python-threading-vs-processes/

For the lazier
joblib, simple package outside Python for writing parallel loops
using multiprocessing
Easy syntax and optimized to be fast and robust in particular on
large data and has specific optimizations for numpy arrays
Transparent and fast disk-caching of output value
(memoize function)
Embarrassingly parallel helper
Logging/tracing
Fast compressed Persistence (pickle to load and dump
data)

For the lazier
>>> from math import sqrt
>>> [sqrt(i ** 2) for i in range(10)]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
>>> from joblib import Parallel, delayed
>>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
create a multiprocessing pool that
forks the Python interpreter in
multiple processes to execute each
of the items of the list.

For the lazier
>>> [sqrt(i ** 2) for i in range(10)]
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
>>> Parallel(n_jobs=2)(delayed(sqrt)(i ** 2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
create a multiprocessing pool that
forks the Python interpreter in
multiple processes to execute each
of the items of the list.
The delayed function is a simple trick
to be able to create a tuple (function,
args, kwargs) with a function-call
syntax.

Examples
>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
>>> from math import modf
>>> r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10))
>>> res, i = zip(*r)
>>> res
(0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5)
>>> i
(0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0)

Examples
>>> from time import sleep
>>> r = Parallel(n_jobs=2, verbose=5)(delayed(sleep)(.1) for _ in range(10))
[Parallel(n_jobs=2)]: Done 1 out of 10 | elapsed: 0.1s remaining: 0.9s
[Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 0.5s ﬁnished
>>> from heapq import nlargest
>>> Parallel(n_jobs=2)(delayed(nlargest)(2, n) for n in (range(4), 'abcde', 3))
#...
---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
TypeError Mon Nov 12 11:37:46 2012
PID: 12934 Python 2.7.3: /usr/bin/python
...........................................................................
/usr/lib/python2.7/heapq.pyc in nlargest(n=2, iterable=3, key=None)
419 if n >= size:
420 return sorted(iterable, key=key, reverse=True)[:n]
421
422 # When key is none, use simpler decoration
423 if key is None:
--> 424 it = izip(iterable, count(0,-1)) # decorate
425 result = _nlargest(n, it)
426 return map(itemgetter(0), result) # undecorate
427
428 # General case, slowest method
!
TypeError: izip argument #1 must support iteration

joblib benefits
It helped us to put into production in hours the capability of our pipeline to run in
parallel taking the maximum of our cores
Easy to read and to debug
But it requires for multiple tasks and steps a expertise on allocating CPUS
in order to avoid memory overscheduling.

The most important!
Interruption of multiprocesses jobs with ‘Ctrl-C’!!
good bye kill all pids!

Further information
https://pythonhosted.org/joblib/parallel.html
pip install joblib

Joblib: Lightweight pipelining for parallel jobs (v2)

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (19)

Similar a Joblib: Lightweight pipelining for parallel jobs (v2)

Similar a Joblib: Lightweight pipelining for parallel jobs (v2) (20)

Más de Marcel Caraciolo

Más de Marcel Caraciolo (20)

Último

Último (20)

Joblib: Lightweight pipelining for parallel jobs (v2)