6. Gearmand
Daemon that manages the work.
Does not do any work.
Accepts a job id and a binary payload from
clients.
Workers keep connections open at all
times.
http://www.flickr.com/photos/andrefromont/4896802557
8. Client
Clients connect to Gearmand and ask for
work to be done.
The client can fire and forget or wait on a
response.
Multiple jobs can be done asynchronously
by workers for one client.
http://www.flickr.com/photos/pitadel/4951801589
10. Workers
Daemonized code
A single worker can do just one job or can
do many jobs.
Does not have to be written using the
same language as the worker.
http://www.flickr.com/photos/nathaninsandiego/5972599772
11. Key Features
• Background jobs
• De-duplication of jobs
• Multiple jobs per client
• High, normal and low priority
• Work will be resubmitted if not completed
12. Background Jobs
• Clients can fire and forget work to be done
• Well suited for data marshalling
• Minimal ability to track the status
13. De-duplication
• Clients provide a unique job id
• If more than one client provides the same
job id, work is done once
• Not a cache, once the job is done, the id is
gone. The work will be done again.
14. Priority
• High, Normal and Low priority options.
• New items are inserted at the end of the
queue based on priority
• Priority is per job type, not global
15. Worker Selection
• Uses the “game show method”
• Workers that do multiple jobs will more
likely get jobs “higher” in their list
• Can appear to be clearing out one queue
over another, but not really a design choice
16. Operational Visibility
• Gearmand can report status about jobs and
workers
• It is only a view of current status, not
historical
• Use outside tools to graph what work was
done when
18. Memcached
Main
Main
Main
Database
Database
Database
Web
Web
Web
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Servers
Servers
Servers
19. Memcached
Main
Main
Main
Database
Database
Database
Web
Web
Web
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Servers
Servers
Servers
This is so 2005!
20. Main
Main
Optimized
Database
Database CRO
Database or In N Main
Main
Proc
ess Main
Database
Database
Database
Web
Web
Web
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Servers
Servers
Servers
21. Main
Main
Optimized
Database
Database CRO
Database or In N Main
Main
Proc
ess Main
Database
Database
Database
Web
Web
Web
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Servers
Servers
Servers
This is so 2009!
22. Main
Main
Optimized
Database
Database
Database
Main
Main
Main
Web
Web Database
Database
Web
Web Database
Servers
Web
Servers
Web
Servers
Web
Servers
Web
Servers
Servers
Servers
Servers
Gearman
Gearman
Gearman
Gearman
Workers
Gearman
Workers
Gearman
Workers
Gearman
Workers
Gearman
Workers
Workers
Workers
Workers
Gearmand
Backend
Events
23. Why Gearman
• Rid us of database spikes
• Changes “feel” realtime
• In the case of an issue, changes can queue
up and happen when things are stable
• Changes can happen asynchronously
24. SMTP Replacement
• Large daily newsletter at 3PM
• Email alerts go out on demand to
thousands of readers as deals are published
• Bottleneck was from double queuing in the
mail queue
• SMTP Server was a single point of failure
25. Web
Web Cron
Web
Web Cron
Cron
Servers
Web
Servers Cron
Jobs
Web
Servers
Web Backend
Jobs
Servers
Web
Servers Jobs
Jobs
Servers
Servers Events
Servers
Gearmand
Gearman
Gearman Gearman
Gearman
Gearman
Gearman Gearman
Gearman
Workers
Gearman
Workers Workers
Gearman
Workers
Gearman
Workers
Gearman Gearman
Workers
Gearman
Workers
Gearman
Workers SMTP Workers
Gearman
Workers SMTP
Workers
Workers Workers
Workers
Workers Server Workers Server
27. Logging Options
• Disk - reliable unless load is high. Can’t be
queried easily in real time.
• MySQL - Can make complex queries
against it. Under high load, data can be lost
• Other - (Spread, Scribe, etc.) New daemons
to manage, learn, scale, etc.
28. Logging via Gearman
• Frontend can fire and forget log data,
returning immediately to the application
• Log data is queued
• Workers can process the logs in any
number of ways
• Log data can be stored any number of ways
29. Web
Web
Web
Web
Servers
Web
Writing Log Data
Servers
Web
Servers
Web
Servers
Web
Servers
Servers
Servers
Servers
Gearmand
Gearman
Gearman Gearman
Gearman
Gearman
Gearman Gearman
Gearman
Workers
Gearman
Workers Workers
Gearman
Workers
Gearman
Workers
Gearman Gearman
Workers
Gearman
Workers
Gearman
Workers MySQL Workers
Gearman
Workers MySQL
Workers
Workers Workers
Workers
Workers Server Workers Server
30. Web
Web Querying Log Data
Web
Web
Servers
Web
Servers
Web
Servers
Web
Servers
(Map Reduce “ish”)
Backend
Servers
Servers
Servers
App
Gearmand
Gearman
Gearman Gearman
Gearman
Gearman
Gearman Gearman
Gearman
Workers
Gearman
Workers Workers
Gearman
Workers
Gearman
Workers
Gearman Gearman
Workers
Gearman
Workers
Gearman
Workers MySQL Workers
Gearman
Workers MySQL
Workers
Workers Workers
Workers
Workers Server Workers Server
34. Normalizing URIs
• Define what parameters a request needs
• sort
• view
• region
• date
• start
• Throw out the rest
• Sort what you need
• Build the real URL
35. Normalizing URIs
• http://dealnews.com/
• http://dealnews.com/?sort=category
• http://dealnews.com/?ref=foobar
• http://dealnews.com/?region=nyc
All become:
http://dealnews.com/?sort=category&view=large®ion=nyc
(assuming the user is in New York)
36. Why normalize/funnel?
• We can now cache the data for this request and
know it is the same data even if the original URI is
different. (cache reuse)
• We can fetch the content only once for all
requests coming in for the content via request
funneling.
37. Why normalize/funnel?
• 72 Unique URIs for the front page in 3 minute spike.
There were only 6 possible real versions. (normalizing)
• Thousands of syndication requests hit the app servers
between 10:43 and 10:45. There were only 86 unique
URIs. (funneling)
38. Request Funneling
Proxy Server
Apache Apache Apache Apache Apache
Child Child Child Child Child
http://dealnews.com/?sort=category&view=large®ion=nyc
Gearmand
Gearman Web
Worker Server
39. What does a worker do?
• Builds a new URI from the input data
• Makes an HTTP request to an app server
• If cacheable, stores the data in the cache
(important!)
• Returns the data (page) to the proxy (via
Gearmand)
40. The Magical
World of Gearman
Brian Moon
dealnews.com
http://brian.moonspot.net/
@brianlmoon
More Information:
http://gearman.org/
Need to run PHP workers?
https://github.com/brianlmoon/GearmanManager