Most developers write them and every company has them – a vast library of small and large scripts that are designed to run on a scheduled basis. These background angels help keep the lights on and the doors open. They’ve been built up over time and are forgotten little heroes that are only remembered when the machines they live on fail. They are scattered throughout a company’s IT infrastructure and do important things.
In this session, we will explain how to use Ruby on Simple Workflow to quickly build a system that schedules scripts, runs them on time, retries them if they fail, and stores the history of their execution. You will walk away from this session with an understanding of how Simple Workflow brings resiliency, concurrency, and tracking to your applications.
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Using cron in the cloud with Amazon Simple Workflow
1. Asad Jawahar
Give your little scripts big wings: Using cron in the cloud with
Amazon Simple Workflow
Senior Technical Program Manager
2. For applications with multiple connected steps…
• Amazon Simple Workflow provides the building blocks and a controller to
reduce the complexity of infrastructure programming and state machinery.
This helps customers focus on...
• Algorithms and logic that make their organizations unique
4. Cron
• Scheduled tasks
• Why use SWF for Cron?
– Failure handling
– Scale
– Lost tasks (office move, machine failure)
– OS provided cron not an option (shared hosting)
5. Roadmap
c
c : code
The implementation of
your control logic and
steps
d
d : deployment
worker processes,
machines, SWF
processing engine
l
l : logical
process model, flow
control, retry logic
etc
→ →
6. Key Simple Workflow concepts
Activities (and workers)
Workflows (and workers)
• Discrete steps in your
application, processors
• Logical grouping for multiple
activities, processors
Deciders
• Control logic for workflows:
execution order, retry policies,
timer logic, etc. – Decision tasks
Retained workflows
• Execution history for
workflows: auditable,
re-playable
Timers, signals
• Scheduled actions, Workflow
“interrupt requests”
8. Responsibilities
AWSYou
SWF
• Step sequence logic
(“secret sauce”)
• Discrete step logic
• Workflow workers
• Activity workers
• I/O data
• State of execution
• Pending tasks
• Execution history
• Searching histories
• Control engine
9. class CronWorkflow
extend Decider
entry_point :start
activity_client :activityClient do |options|
options.prefix_name = "CronActivities"
end
def start(arg)
while true do
create_timer(86400){activityClient.backup “myDB"}
end
end
end
Cron in SWF
class CronActivities
extend Activity
activity :backup
def backup(arg)
# backup script
end
end
10. Daily Build Process
Ordinary cron, single point of failure
Download
files from S3
Run build
tools
Upload build
artifacts to
S3
Delete local
files
Send email
13. SWF
Resilient Cron
Distributed
asynchronous
interactions
Failure handling
Scalability Latency Auditability
• Coordinate scheduled tasks
across distributed hosts
• Reliable messaging
Coordination engine in the
cloud
• Stores and dispatches tasks
• Delivery guarantees
• Add workers at any time
• Need stateless workers
• State provided by another system
Scalability
• Repository of distributed
state
• Workers poll for work
• Exactly once delivery
• Many types of failures
• Different mitigations
• Must react quickly
Fault Tolerance
• No loss of distributed state
• Supports timeouts on work
• Explicit failure modeling
• Need visibility into process
execution
• Retain records for investigation
Audit History
• Index of executions
• Play-by-play history of
execution
• Retention of completed
executions
• Get work quickly
• Route tasks to specific workers
Low Latency
• Long polling for work
• Task routing through task
lists
Requirements
SWF provides
14. Introducing AWS Flow Framework (Ruby)
• Ease of use library for Amazon SWF APIs
• Uses standard Ruby constructs
• Open source
• Packaged as a gem
• In Private Beta (stay tuned for release)
Amazon
SWF
15. Benefits of AWS Flow Framework
• Run and load balance work on many machines with minimal changes
• Simplifies remote, long running, non-blocking steps
• Uses Amazon SWF to:
– Keep track of progress
– Triggers your code to execute next steps
• Failure handling built in
• Easily evolve your logic without adding complexity
Amazon
SWF
16. Hourly Build – Logical Control Flow
wait (1 hour)
copy files
run build task
upload files
delete local files
send email
if (failed)
retry up to 3 times
repeat
Download
files from S3
Run build
tools
Upload build
artifacts to
S3
Delete local
files
Send email
17. Build Cron Workflow – Execution Flow
Amazon
SWF
Execution History
- Input data
- Download complete
- Build complete
- Upload complete
Decisions:
1. Schedule download [shared]
2. Schedule build [worker1]
3. Schedule upload [worker1]
DECIDER
Makes decisions on what tasks to
schedule, when, in what order
Start Workflow Execution
Your App, SWF Console or
CLI
Starts execution of Cron
Workflow
Worker
Worker
Long poll
Long Poll
Long Poll
Worker 2
Worker 1
Decision Tasks
Get task
Get task
1. /tmp, worker1
Return decisions
Shared
- Delete local file
- Email sent
2. Built
4. Schedule delete files [worker1]
5. Schedule email [worker2]
6. Execution complete
3. Uploaded
4. Deleted
5. Email sent
Get task
18. Hourly Build – Decider
class BuildWorkflow
extend Decider
entry_point :start
activity_client :client do |options|
options.prefix_name = "BuildActivities"
end
def start(source_bucket, target_bucket)
while true do
create_timer(25) { start_build(source_bucket, target_bucket)}
end
end
def start_build(source, target)
begin
dir = client.download(source)
client.build(dir)
client.upload(dir, target)
ensure
client.delete(dir)
client.send_email(dir)
end
end
19. Task Routing
def start_build(source_bucket, target_bucket)
activity_client :client do |options|
options.prefix_name = "BuildActivities"
end
host_specific_task_list, dir = client.download bucket
client.build(dir) do |options|
options.default_task_list =host_specific_task_list
end
client.upload(dir, target_bucket) do |options|
options.default_task_list =host_specific_task_list
end
client.delete(dir) do |options|
options.default_task_list =host_specific_task_list
end
end
end
20. Exponential Retry
def start_build(source_bucket, target_bucket)
activity_client :client do |options|
options.prefix_name = "BuildActivities"
end
dir = client.exponential_retry (:download, bucket) do |options|
options.maximum_attempts = 3
end
client.build(dir)
client.exponential_retry (:upload, dir, target_bucket) do |options|
options.maximum_attempts = 3
end
client.exponential_retry (:delete, dir) do |options|
options.maximum_attempts = 3
end
end
21. Activities
class BuildActivities extend Activity
activity :download, :build, :upload, :delete do |options|
options.default_task_list = "list1"
options.version = "1"
options.default_task_heartbeat_timeout = "3600"
options.default_task_schedule_to_close_timeout = "30"
options.default_task_schedule_to_start_timeout = "30"
options.default_task_start_to_close_timeout = "30"
end
def download(bucket)
puts bucket
end
def build(dir)
puts dir
end
def upload(dir, bucket)
puts bucket
end
def delete(dir)
puts dir
end
end
22. Multiple builds in parallel
• Parent Cron workflow kicks off child Build Workflows
• Child workflow
– A workflow started from another workflow
– Runs independently with its own history
– Invocation similar to activities
– Factors functionality into reusable components
• Flow and SWF can run Child workflows and activities in parallel
23. Cron Workflow
(parent)
Build Workflow
(OS A)
Build Workflow
(OS B)
Download
files from S3
Run build
tools
Upload build
artifacts to
S3
Delete local
files
Send email
Download
files from S3
Run build
tools
Upload build
artifacts to
S3
Delete local
files
Send email
Multiple builds in parallel
24. Multiple builds in parallel
class CronWorkflow extend Decider
entry_point :start
def start(w_source_bucket, w_target_bucket, l_source_bucket, l_target_bucket)
while true do
workflow_client :w_client do |options|
options.name="BuildWorkflow"
options.task_list="win"
end
workflow_client :l_client do |options|
options.name="BuildWorkflow"
options.tasklist="linux"
end
create_timer(arg) do
result1 = w_client.send_async :start, w_source_bucket, w_target_bucket
result2 = l_client.send_async :start, l_source_bucket, l_target_bucket
wait_for_all(result1, result2)
end
continue_as_new
end
end
end
25. Concurrency in Ruby Flow
• Decider is single threaded
• Blocking semantics by default
• send_async for asynchronous execution
– Returns a Future
– Cedes control when waited on
– Uses fibers (requires Ruby 1.9 or better)
– Code looks similar to synchronous code
26. Continuous workflows
• SWF allows a workflow to stay open up to 1 year
• Workflow history grows over time as events get added
• Large history => latency
• Create new runs to keep history size in check
class BuildWorkflow extend Decider
entry_point :start
def start(source_bucket, target_bucket)
while true do
create_timer(3600) { start_build }
continue_as_new(source_bucket, target_bucket)
end
end
27. Activity Worker
• Hosts activity implementation
• Polls SWF for tasks and dispatches to your code
• Uses two thread pools
– Polling
– Running activity tasks
• Activity implementation must be thread safe
activity_worker = ActivityWorker.new(swf.client, domain, task_list)
activity_worker.add_activities_implementation(BuildActivities)
activity_worker.start
28. Workflow Worker
• Hosts workflow implementation
• Polls SWF for tasks and dispatches to your code
• Uses a single thread pool for polling and running tasks
– Your logic should be light weight and deterministic
• Delegate heavy lifting to activities
worker = WorkflowWorker.new(swf.client, domain, task_list)
worker.add_workflow_implementation_type(CronWorkflow)
worker.add_workflow_implementation_type(BuildWorkflow)
workflow_worker.start