SlideShare una empresa de Scribd logo
1 de 24
django-celery
a distributed task queue




                           alex@eftimie.ro
                               2013/04/11
problem
fast sites > slow sites

caching, load balancing, CDN, nosql, ajax, ...
special problems
send an email to 10.000 users

add watermark to an uploaded video

generate a big PDF
oldschool solution: crontab
the good:
   - simple
   - easy to setup
the bad:
   - too simple
   - hard to scale
alternatives
scheduled (crontab-ish):
  - APScheduler

parallel work (celery-ish):
  - gearman
  - Huey
  - django-ztask
                              not a fan :-)
introducing Celery
introducing Celery
asynchronous job queue/task queue
based on distributed message passing
focused on real time operation
works for scheduled tasks too!



open source and integrated with Django
how it works?

                                        worker 1

website                                    .
  (view)        celery             MB
                                           .
                                           .



                                        worker n


                         results
task
@celery.task
def add(x, y):
    return x + y
...
add.delay(2, 2) #somewhere in a view

atomic
ideally idempotent
same environment as the website
real-time or scheduled
task states
PENDING
STARTED
SUCCESS
FAILURE
RETRY
REVOKED
few tips about tasks
granularity
data locality
state


   “asserting the world is the responsibility of the task”
subtasks
tasks spawned from within another task

calling:
add.subtask(1, 1).delay()
add.s(1, 1).delay() # this is a shortcut


the primitives:
chains, groups, chords, maps, chunks
subtasks - partials
add.s(1, 1).delay()
1 + 1

partial:
partial = add.s(1) # incomplete definition
partial.delay(2)
1 + 2        # same as: add.s(1, 2).delay()
partial.delay(0)
1 + 0
subtasks: chains
subtasks running one after another
chain(add.s(4, 4), mul.s(8), mul.s(10))

or
add.s(4, 4) | mul.s(8) | mul.s(10)
((4 + 4) * 8) * 10)
subtasks: groups
independent tasks running in parallel
g = group(add.s(2, 2), add.s(4, 4))
res = g()
res.get()
[4, 8]
subtasks: chords
same as groups, but apply callback on results
@task
def ts(numbers):
    return sum(numbers)

chord(add.s(i,i) for i in range(3))(ts.s())

6 # sum([0, 2, 4])


chord(headers)(callback)
subtasks: chunks

split a long list of arguments into parts
res = add.chunks(zip(range(100), range(100)), 10)()
>>> res.get()
[[0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
 [20, 22, 24, 26, 28, 30, 32, 34, 36, 38],
 [40, 42, 44, 46, 48, 50, 52, 54, 56, 58],
 [60, 62, 64, 66, 68, 70, 72, 74, 76, 78],
 [80, 82, 84, 86, 88, 90, 92, 94, 96, 98],
 [100, 102, 104, 106, 108, 110, 112, 114, 116, 118],
 [120, 122, 124, 126, 128, 130, 132, 134, 136, 138],
 [140, 142, 144, 146, 148, 150, 152, 154, 156, 158],
 [160, 162, 164, 166, 168, 170, 172, 174, 176, 178],
 [180, 182, 184, 186, 188, 190, 192, 194, 196, 198]]

                          (10 tasks of 10 adds each)
integrate with django project
tasks module inside django app
configuration setup
   - message broker
   - result backend
start a worker
$ python manage.py celery worker




  (start another, tune concurrency, monitor, ...)
hidden ad




            supervisord.org
case study: korect
quiz exam management and automatic paper
processing using OMR


existing solution: desktop app, ~2 tests/minute
using ReportLab, PyPDF, OpenCV
django + celery version
~ 20 tests/minute
same machine, 4 worker processes
parallelized parts:
   - print file generation
   - paper scanning
   - correcting and grading
   - question usage report
example - generate print file
add a Download object to db

delay the task (not the real task):
chord([test_pdf for t in tests])(merge_pdf)

update page with dw status
...
at the end of merge_pdf:
      update status, flash user
last slide
high availability, high performance solution
easy to set up
fun to use


                 celeryproject.org
?

Más contenido relacionado

La actualidad más candente

Controlling The Cloud With Python
Controlling The Cloud With PythonControlling The Cloud With Python
Controlling The Cloud With Python
Luca Mearelli
 
PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2
Graham Dumpleton
 

La actualidad más candente (20)

Introduction to Celery
Introduction to CeleryIntroduction to Celery
Introduction to Celery
 
Advanced task management with Celery
Advanced task management with CeleryAdvanced task management with Celery
Advanced task management with Celery
 
Scaling up task processing with Celery
Scaling up task processing with CeleryScaling up task processing with Celery
Scaling up task processing with Celery
 
Hacking ansible
Hacking ansibleHacking ansible
Hacking ansible
 
Containers & Dependency in Ember.js
Containers & Dependency in Ember.jsContainers & Dependency in Ember.js
Containers & Dependency in Ember.js
 
V2 and beyond
V2 and beyondV2 and beyond
V2 and beyond
 
More tips n tricks
More tips n tricksMore tips n tricks
More tips n tricks
 
Celery
CeleryCelery
Celery
 
AnsibleFest 2014 - Role Tips and Tricks
AnsibleFest 2014 - Role Tips and TricksAnsibleFest 2014 - Role Tips and Tricks
AnsibleFest 2014 - Role Tips and Tricks
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To Batch
 
Building Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker SwarmBuilding Distributed System with Celery on Docker Swarm
Building Distributed System with Celery on Docker Swarm
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
A Little Backbone For Your App
A Little Backbone For Your AppA Little Backbone For Your App
A Little Backbone For Your App
 
Controlling The Cloud With Python
Controlling The Cloud With PythonControlling The Cloud With Python
Controlling The Cloud With Python
 
Asynchronous programming done right - Node.js
Asynchronous programming done right - Node.jsAsynchronous programming done right - Node.js
Asynchronous programming done right - Node.js
 
Threads, Queues, and More: Async Programming in iOS
Threads, Queues, and More: Async Programming in iOSThreads, Queues, and More: Async Programming in iOS
Threads, Queues, and More: Async Programming in iOS
 
Avoiding callback hell in Node js using promises
Avoiding callback hell in Node js using promisesAvoiding callback hell in Node js using promises
Avoiding callback hell in Node js using promises
 
PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2PyCon US 2012 - State of WSGI 2
PyCon US 2012 - State of WSGI 2
 
Celery with python
Celery with pythonCelery with python
Celery with python
 
Testing Javascript with Jasmine
Testing Javascript with JasmineTesting Javascript with Jasmine
Testing Javascript with Jasmine
 

Similar a Django Celery - A distributed task queue

Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
bacongobbler
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repository
nobby
 
Deployment with Fabric
Deployment with FabricDeployment with Fabric
Deployment with Fabric
andymccurdy
 

Similar a Django Celery - A distributed task queue (20)

Things about Functional JavaScript
Things about Functional JavaScriptThings about Functional JavaScript
Things about Functional JavaScript
 
Exploring Clojurescript
Exploring ClojurescriptExploring Clojurescript
Exploring Clojurescript
 
Deixa para depois, Procrastinando com Celery em Python
Deixa para depois, Procrastinando com Celery em PythonDeixa para depois, Procrastinando com Celery em Python
Deixa para depois, Procrastinando com Celery em Python
 
sbt core concepts (ScalaMatsuri 2019)
sbt core concepts (ScalaMatsuri 2019)sbt core concepts (ScalaMatsuri 2019)
sbt core concepts (ScalaMatsuri 2019)
 
sbt: core concepts and updates (Scala Love 2020)
sbt: core concepts and updates (Scala Love 2020)sbt: core concepts and updates (Scala Love 2020)
sbt: core concepts and updates (Scala Love 2020)
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
 
Gpars workshop
Gpars workshopGpars workshop
Gpars workshop
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
SCALA - Functional domain
SCALA -  Functional domainSCALA -  Functional domain
SCALA - Functional domain
 
The Java Content Repository
The Java Content RepositoryThe Java Content Repository
The Java Content Repository
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Deployment with Fabric
Deployment with FabricDeployment with Fabric
Deployment with Fabric
 
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitterApache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
Apache Zeppelin on Kubernetes with Spark and Kafka - meetup @twitter
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
Angular Optimization Web Performance Meetup
Angular Optimization Web Performance MeetupAngular Optimization Web Performance Meetup
Angular Optimization Web Performance Meetup
 
Tasks: you gotta know how to run them
Tasks: you gotta know how to run themTasks: you gotta know how to run them
Tasks: you gotta know how to run them
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
Going Reactive with Relational Databases
Going Reactive with Relational DatabasesGoing Reactive with Relational Databases
Going Reactive with Relational Databases
 

Más de Alex Eftimie (8)

Introducere in web
Introducere in webIntroducere in web
Introducere in web
 
ROSEdu - Starea comunității
ROSEdu - Starea comunității ROSEdu - Starea comunității
ROSEdu - Starea comunității
 
Introducere în Linux
Introducere în LinuxIntroducere în Linux
Introducere în Linux
 
Flask intro - ROSEdu web workshops
Flask intro - ROSEdu web workshopsFlask intro - ROSEdu web workshops
Flask intro - ROSEdu web workshops
 
Prezentare raport de cercetare
Prezentare raport de cercetarePrezentare raport de cercetare
Prezentare raport de cercetare
 
LabRemote - Web in Progress
LabRemote - Web in ProgressLabRemote - Web in Progress
LabRemote - Web in Progress
 
Diploma Presentation
Diploma PresentationDiploma Presentation
Diploma Presentation
 
Lucruri noi in MPI-2
Lucruri noi in MPI-2Lucruri noi in MPI-2
Lucruri noi in MPI-2
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Django Celery - A distributed task queue

  • 1. django-celery a distributed task queue alex@eftimie.ro 2013/04/11
  • 2. problem fast sites > slow sites caching, load balancing, CDN, nosql, ajax, ...
  • 3. special problems send an email to 10.000 users add watermark to an uploaded video generate a big PDF
  • 4. oldschool solution: crontab the good: - simple - easy to setup the bad: - too simple - hard to scale
  • 5. alternatives scheduled (crontab-ish): - APScheduler parallel work (celery-ish): - gearman - Huey - django-ztask not a fan :-)
  • 7. introducing Celery asynchronous job queue/task queue based on distributed message passing focused on real time operation works for scheduled tasks too! open source and integrated with Django
  • 8. how it works? worker 1 website . (view) celery MB . . worker n results
  • 9. task @celery.task def add(x, y): return x + y ... add.delay(2, 2) #somewhere in a view atomic ideally idempotent same environment as the website real-time or scheduled
  • 11. few tips about tasks granularity data locality state “asserting the world is the responsibility of the task”
  • 12. subtasks tasks spawned from within another task calling: add.subtask(1, 1).delay() add.s(1, 1).delay() # this is a shortcut the primitives: chains, groups, chords, maps, chunks
  • 13. subtasks - partials add.s(1, 1).delay() 1 + 1 partial: partial = add.s(1) # incomplete definition partial.delay(2) 1 + 2 # same as: add.s(1, 2).delay() partial.delay(0) 1 + 0
  • 14. subtasks: chains subtasks running one after another chain(add.s(4, 4), mul.s(8), mul.s(10)) or add.s(4, 4) | mul.s(8) | mul.s(10) ((4 + 4) * 8) * 10)
  • 15. subtasks: groups independent tasks running in parallel g = group(add.s(2, 2), add.s(4, 4)) res = g() res.get() [4, 8]
  • 16. subtasks: chords same as groups, but apply callback on results @task def ts(numbers): return sum(numbers) chord(add.s(i,i) for i in range(3))(ts.s()) 6 # sum([0, 2, 4]) chord(headers)(callback)
  • 17. subtasks: chunks split a long list of arguments into parts res = add.chunks(zip(range(100), range(100)), 10)() >>> res.get() [[0, 2, 4, 6, 8, 10, 12, 14, 16, 18], [20, 22, 24, 26, 28, 30, 32, 34, 36, 38], [40, 42, 44, 46, 48, 50, 52, 54, 56, 58], [60, 62, 64, 66, 68, 70, 72, 74, 76, 78], [80, 82, 84, 86, 88, 90, 92, 94, 96, 98], [100, 102, 104, 106, 108, 110, 112, 114, 116, 118], [120, 122, 124, 126, 128, 130, 132, 134, 136, 138], [140, 142, 144, 146, 148, 150, 152, 154, 156, 158], [160, 162, 164, 166, 168, 170, 172, 174, 176, 178], [180, 182, 184, 186, 188, 190, 192, 194, 196, 198]] (10 tasks of 10 adds each)
  • 18. integrate with django project tasks module inside django app configuration setup - message broker - result backend start a worker $ python manage.py celery worker (start another, tune concurrency, monitor, ...)
  • 19. hidden ad supervisord.org
  • 20. case study: korect quiz exam management and automatic paper processing using OMR existing solution: desktop app, ~2 tests/minute using ReportLab, PyPDF, OpenCV
  • 21. django + celery version ~ 20 tests/minute same machine, 4 worker processes parallelized parts: - print file generation - paper scanning - correcting and grading - question usage report
  • 22. example - generate print file add a Download object to db delay the task (not the real task): chord([test_pdf for t in tests])(merge_pdf) update page with dw status ... at the end of merge_pdf: update status, flash user
  • 23. last slide high availability, high performance solution easy to set up fun to use celeryproject.org
  • 24. ?