Slide of the RailsConf 2009 session
Discover how is possible to use parallel execution to batch process large amount of data, learn how to use queues to distribute workload and coordinate processes, increase the throughput on system with high latency. Have fun with EventMachine, AMQP, RabbitMQ and get rid of that every 5mins cronjob
6. Map Reduce
“Programming model for
processing and generating large
data sets”
(Google)
7. Map Reduce quot;Mapquot; step
the master node takes the
input, chops it up into smaller
sub-problems, and distributes
those to worker nodes.
(Wikipedia)
9. Is it as simple as...
clients.map do |client|
client.invoice
end
10. No!
Because the process is:
• distributed
• concurrent
11. Problems:
• How many nodes?
• How many workers?
• Distribution mechanism to
feed the workers?
12. What about queuing?
• the master node takes the input, chops
it up into smaller sub-problems, and
publishes them in a queue
• workers independently consume the
content of the queue
13. Here comes
• RabbitMQ is an implementation of AMQP,
the emerging standard for high
performance enterprise messaging
• It’s opensource
• Can be used to manage queues
• Written in Erlang
14. Erlang?
• Erlang is a general-purpose concurrent
programming language designed by Ericsson
• distributed
• fault tolerant
• soft real time
• high availability
23. EventMachine
Is an implementation of Reactor Pattern
• Non blocking IO and lightweight
concurrency
• eliminate the complexities of high-
performance threaded network
programming
27. EM - Deferrables
“The Deferrable pattern allows you to specify
any number of Ruby code blocks that will be
executed at some future time when the status
of the Deferrable object changes “
31. Achieved so far
• Easy distribution of tasks
• Architecture that supports arbitrary
number of workers (and masters)
• Concurrency within the single worker
36. Multicasting
Cons1
msg A Queue1
Publisher
msg A Queue2 Cons2
Exchange
Cons3
msg A Queue3
37. Not only queues then
Use messages distribution to build the
nervous system of your app
• communication across hosts,
heterogeneous systems
• low latency
• clustering
38. Where to start?
crontab -l
5 * * * * bin/do_the_quick_thing.rb
0 2 * * * bin/do_the_scary_thing.rb