3. Housekeeping
This is going to be a deeply technical talk
If reality seems to be imploding...
Feel free to zone out for a bit
Ask questions
Greg Logan February 15, 2018 2 / 30
4. Housekeeping
This is going to be a deeply technical talk
If reality seems to be imploding...
Feel free to zone out for a bit
Ask questions
This presentation abuses UML
Greg Logan February 15, 2018 2 / 30
5. Housekeeping
This is going to be a deeply technical talk
If reality seems to be imploding...
Feel free to zone out for a bit
Ask questions
This presentation abuses UML
This is being recorded
Greg Logan February 15, 2018 2 / 30
6. Housekeeping
This is going to be a deeply technical talk
If reality seems to be imploding...
Feel free to zone out for a bit
Ask questions
This presentation abuses UML
This is being recorded
Shout questions as you think of them
Greg Logan February 15, 2018 2 / 30
7. Opencast Job Dispatching
Overview
Quick review: Services, and how they are registered
Anatomy of a job
How is a job created?
How does a job get dispatched?
What is a workflow? How does it differ from a job?
How is a workflow created?
(Relatively) complete workflow in steps, a descent into madness
Greg Logan February 15, 2018 3 / 30
8. Quick Review: Service and Service Registration
Opencast services register themselves with the service registry
This registry is local
The database synchronizes the registrations through the cluster
Local services talk directly to the local service registry
Remote services talk to their remote, which talks to its local registry
The architecture of how this all works was explained last talk
Greg Logan February 15, 2018 4 / 30
9. Anatomy of a Job
What is an Opencast Job?
Database object
Greg Logan February 15, 2018 5 / 30
10. Anatomy of a Job
What is an Opencast Job?
Database object
A representation of a unit of work within Opencast
Greg Logan February 15, 2018 5 / 30
11. Anatomy of a Job
What is an Opencast Job?
Database object
A representation of a unit of work within Opencast
A way to asynchronously keep track of your operations!
Greg Logan February 15, 2018 5 / 30
12. Anatomy of a Job
What is an Opencast Job?
Database object
A representation of a unit of work within Opencast
A way to asynchronously keep track of your operations!
Contains the data for a full operation (ie, encode of a stream)
Greg Logan February 15, 2018 5 / 30
13. Anatomy of a Job
What is an Opencast Job?
Database object
A representation of a unit of work within Opencast
A way to asynchronously keep track of your operations!
Contains the data for a full operation (ie, encode of a stream)
19 fields!
Status
Creating Service Type
Operation
Dispatchable
Job Load
Blocking Job
Blocked By
Greg Logan February 15, 2018 5 / 30
14. Job Creation
How is a job created?
A job is created by the service registry (SR) when an operation is
started
Greg Logan February 15, 2018 6 / 30
15. Job Creation
How is a job created?
A job is created by the service registry (SR) when an operation is
started
Each encode generates a job, as does each publish
Greg Logan February 15, 2018 6 / 30
16. Job Creation
How is a job created?
A job is created by the service registry (SR) when an operation is
started
Each encode generates a job, as does each publish
These jobs may spawn subjobs
An encode nearly always spawns an inspect job
Greg Logan February 15, 2018 6 / 30
17. Job Creation
How is a job created?
A job is created by the service registry (SR) when an operation is
started
Each encode generates a job, as does each publish
These jobs may spawn subjobs
An encode nearly always spawns an inspect job
Jobs can block waiting for their children
Greg Logan February 15, 2018 6 / 30
18. Job Creation
How is a job created?
A job is created by the service registry (SR) when an operation is
started
Each encode generates a job, as does each publish
These jobs may spawn subjobs
An encode nearly always spawns an inspect job
Jobs can block waiting for their children
Jobs can block waiting for resources(*)
Greg Logan February 15, 2018 6 / 30
19. Job Creation
How is a job created?
A job is created by the service registry (SR) when an operation is
started
Each encode generates a job, as does each publish
These jobs may spawn subjobs
An encode nearly always spawns an inspect job
Jobs can block waiting for their children
Jobs can block waiting for resources(*)
An undispatchable job is handled by the host which created it
Greg Logan February 15, 2018 6 / 30
20. Job Creation
How is a job created?
A job is created by the service registry (SR) when an operation is
started
Each encode generates a job, as does each publish
These jobs may spawn subjobs
An encode nearly always spawns an inspect job
Jobs can block waiting for their children
Jobs can block waiting for resources(*)
An undispatchable job is handled by the host which created it
Ingest
Greg Logan February 15, 2018 6 / 30
21. Job Dispatching: The basics
Job dispatching
This is where the sausage gets made
This is very simplified from the actual code
Greg Logan February 15, 2018 7 / 30
22. Job Dispatching: The (initial) sausage factory
function dispatchJobs(List[] jobs)
for all job in jobs do
serviceType ← job.serviceType
candidateServices ← getServicesOfType(serviceType)
serviceId ← dispatchJob(job, candidateServices)
function dispatchJob(Job job, List services)
for all service in services do
accepter ← HTTP.POST(job, service)
if accepter = null then return accepter.id
Greg Logan February 15, 2018 8 / 30
23. Job Dispatching: Weak Sausages
There are a number of issues here
Service fairness
Service load
Job load
Priority/Failed jobs
Greg Logan February 15, 2018 9 / 30
24. Job Dispatching: Service and Job Load
Job Load values
... are not the actual hardware cost to run a job
Greg Logan February 15, 2018 10 / 30
25. Job Dispatching: Service and Job Load
Job Load values
... are not the actual hardware cost to run a job
... are completely arbitrary
Greg Logan February 15, 2018 10 / 30
26. Job Dispatching: Service and Job Load
Job Load values
... are not the actual hardware cost to run a job
... are completely arbitrary
... should be thought of as a counter, rather than a load average
Greg Logan February 15, 2018 10 / 30
27. Job Dispatching: Service and Job Load
Service Load values
... are the sum of the Jobs currently in the RUNNING state
Greg Logan February 15, 2018 11 / 30
28. Job Dispatching: Service and Job Load
Service Load values
... are the sum of the Jobs currently in the RUNNING state
... do not represent the real load on the system
Greg Logan February 15, 2018 11 / 30
29. Job Dispatching: Service and Job Load
So what’s the point of the load value?
Each node/host defines a maximum load for itself
Typically this is equal to the number of processor cores
Greg Logan February 15, 2018 12 / 30
30. Job Dispatching: Service and Job Load
So what’s the point of the load value?
Each node/host defines a maximum load for itself
Typically this is equal to the number of processor cores
The node will be assigned at most that much load
Greg Logan February 15, 2018 12 / 30
31. Job Dispatching: Service and Job Load
So what’s the point of the load value?
Each node/host defines a maximum load for itself
Typically this is equal to the number of processor cores
The node will be assigned at most that much load
(jobs.load) <= node.maxload
Greg Logan February 15, 2018 12 / 30
32. Job Dispatching: Service and Job Load
So what’s the point of the load value?
Each node/host defines a maximum load for itself
Typically this is equal to the number of processor cores
The node will be assigned at most that much load
(jobs.load) <= node.maxload
If node.maxload = 8
job.load = 2 → 4 jobs
job.load = 4 → 2 jobs
job.load > 4 → 1 jobs
Greg Logan February 15, 2018 12 / 30
33. Job Dispatching: Service and Job Load
So what’s the point of the load value?
Each node/host defines a maximum load for itself
Typically this is equal to the number of processor cores
The node will be assigned at most that much load
(jobs.load) <= node.maxload
If node.maxload = 8
job.load = 2 → 4 jobs
job.load = 4 → 2 jobs
job.load > 4 → 1 jobs
Job load can be fractional!
Greg Logan February 15, 2018 12 / 30
34. Job Dispatching: Service and Job Load
So what’s the point of the load value?
Each node/host defines a maximum load for itself
Typically this is equal to the number of processor cores
The node will be assigned at most that much load
(jobs.load) <= node.maxload
If node.maxload = 8
job.load = 2 → 4 jobs
job.load = 4 → 2 jobs
job.load > 4 → 1 jobs
Job load can be fractional!
Job load can be negative!
Greg Logan February 15, 2018 12 / 30
35. Job Dispatching: Service and Job Load
So what’s the point of the load value?
Each node/host defines a maximum load for itself
Typically this is equal to the number of processor cores
The node will be assigned at most that much load
(jobs.load) <= node.maxload
If node.maxload = 8
job.load = 2 → 4 jobs
job.load = 4 → 2 jobs
job.load > 4 → 1 jobs
Job load can be fractional!
Job load can be negative!
Don’t do this...
Greg Logan February 15, 2018 12 / 30
36. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Greg Logan February 15, 2018 13 / 30
37. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Really really good at one thing
Greg Logan February 15, 2018 13 / 30
38. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Really really good at one thing
Set that job’s cost to very small (zero?)
Greg Logan February 15, 2018 13 / 30
39. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Really really good at one thing
Set that job’s cost to very small (zero?)
Set that job’s cost to greater than node.maxload everywhere else
Greg Logan February 15, 2018 13 / 30
40. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Really really good at one thing
Set that job’s cost to very small (zero?)
Set that job’s cost to greater than node.maxload everywhere else
Set the rest of the costs to greater than node.maxload
Greg Logan February 15, 2018 13 / 30
41. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Really really good at one thing
Set that job’s cost to very small (zero?)
Set that job’s cost to greater than node.maxload everywhere else
Set the rest of the costs to greater than node.maxload
That job will only run on that hardware
Greg Logan February 15, 2018 13 / 30
42. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Really really good at one thing
Set that job’s cost to very small (zero?)
Set that job’s cost to greater than node.maxload everywhere else
Set the rest of the costs to greater than node.maxload
That job will only run on that hardware
This can block processing!
Greg Logan February 15, 2018 13 / 30
43. Job Dispatching: Service and Job Load
Aside: Neat Tricks
Specialist nodes
Really really good at one thing
Set that job’s cost to very small (zero?)
Set that job’s cost to greater than node.maxload everywhere else
Set the rest of the costs to greater than node.maxload
That job will only run on that hardware
This can block processing!
Current bug: Cheaper encoding not prioritized (MH-12493)
Greg Logan February 15, 2018 13 / 30
44. Job Dispatching: Service and Job Load
Taking the safeties off
Each node/host defines a maximum load for itself
Greg Logan February 15, 2018 14 / 30
45. Job Dispatching: Service and Job Load
Taking the safeties off
Each node/host defines a maximum load for itself
If the cost for a job exceeds maxload for all nodes the job never
processes
Greg Logan February 15, 2018 14 / 30
46. Job Dispatching: Service and Job Load
Taking the safeties off
Each node/host defines a maximum load for itself
If the cost for a job exceeds maxload for all nodes the job never
processes
org.opencastproject.job.load.acceptexceeding
Greg Logan February 15, 2018 14 / 30
47. Job Dispatching: Service and Job Load
Taking the safeties off
Each node/host defines a maximum load for itself
If the cost for a job exceeds maxload for all nodes the job never
processes
org.opencastproject.job.load.acceptexceeding
This is true by default
Greg Logan February 15, 2018 14 / 30
48. Job Dispatching: Service and Job Load
Taking the safeties off
Each node/host defines a maximum load for itself
If the cost for a job exceeds maxload for all nodes the job never
processes
org.opencastproject.job.load.acceptexceeding
This is true by default
Setting this to false is safe
Greg Logan February 15, 2018 14 / 30
49. Job Dispatching: Service and Job Load
Taking the safeties off
Each node/host defines a maximum load for itself
If the cost for a job exceeds maxload for all nodes the job never
processes
org.opencastproject.job.load.acceptexceeding
This is true by default
Setting this to false is safe
Set this to false prior to changing job loads
Greg Logan February 15, 2018 14 / 30
50. Job Dispatching: Accounting for Load
function mainDispatch( )
repeat
jobs ← getAllJobs( )
dispatchJobs(jobs)
until shutdown
function dispatchJobs(List[] jobs)
for all job in jobs do
serviceType ← job.serviceType
candidateServices ← getServicesOfType(serviceType)
candidateServices ← filterServicesByLoad(job.load)
serviceId ← dispatchJob(job, candidateServices)
Greg Logan February 15, 2018 15 / 30
51. Job Dispatching: Priority
One thing people always want:
How can I make this recording process in front of that one?
Greg Logan February 15, 2018 16 / 30
52. Job Dispatching: Priority
One thing people always want:
How can I make this recording process in front of that one?
This isn’t that
Greg Logan February 15, 2018 16 / 30
53. Job Dispatching: Priority
One thing people always want:
How can I make this recording process in front of that one?
This isn’t that
MH-6850
Greg Logan February 15, 2018 16 / 30
54. Job Dispatching: Priority
One thing people always want:
How can I make this recording process in front of that one?
This isn’t that
MH-6850
This is for handling undispatchable, failed, and queued jobs
Greg Logan February 15, 2018 16 / 30
55. Job Dispatching: Priority
One thing people always want:
How can I make this recording process in front of that one?
This isn’t that
MH-6850
This is for handling undispatchable, failed, and queued jobs
Undispatchable: No service accepted them
Greg Logan February 15, 2018 16 / 30
56. Job Dispatching: Priority
One thing people always want:
How can I make this recording process in front of that one?
This isn’t that
MH-6850
This is for handling undispatchable, failed, and queued jobs
Undispatchable: No service accepted them
Failed: Did not complete successfully
Greg Logan February 15, 2018 16 / 30
57. Job Dispatching: Priority
One thing people always want:
How can I make this recording process in front of that one?
This isn’t that
MH-6850
This is for handling undispatchable, failed, and queued jobs
Undispatchable: No service accepted them
Failed: Did not complete successfully
Queued: New jobs
Greg Logan February 15, 2018 16 / 30
58. Job Dispatching: Accounting for Priority
function mainDispatch( )
repeat
jobs ← getPriorityJobs( )
dispatchJobs(jobs)
jobs ← getRestartJobs( )
dispatchJobs(jobs)
jobs ← getQueuedJobs( )
dispatchJobs(jobs)
jobs ← getAllJobs( )
dispatchJobs(jobs)
until shutdown
Greg Logan February 15, 2018 17 / 30
59. On to workflows
What is a workflow
It’s a recording?
Greg Logan February 15, 2018 18 / 30
60. On to workflows
What is a workflow
It’s a recording?
It’s a processing run for a recording?
Greg Logan February 15, 2018 18 / 30
61. On to workflows
What is a workflow
It’s a recording?
It’s a processing run for a recording?
It’s a collection of jobs
Greg Logan February 15, 2018 18 / 30
62. On to workflows
What is a workflow
It’s a recording?
It’s a processing run for a recording?
It’s a collection of jobs
It’s a job with some metadata
Greg Logan February 15, 2018 18 / 30
63. The Workflow Service
The Workflow Service
Keeps track of all workflows
Greg Logan February 15, 2018 19 / 30
64. The Workflow Service
The Workflow Service
Keeps track of all workflows
Organizes the creation of jobs
Greg Logan February 15, 2018 19 / 30
65. The Workflow Service
The Workflow Service
Keeps track of all workflows
Organizes the creation of jobs
Organizes the sequence of jobs
Greg Logan February 15, 2018 19 / 30
66. The Workflow Service
The Workflow Service
Keeps track of all workflows
Organizes the creation of jobs
Organizes the sequence of jobs
Note that this is creation, not execution
Greg Logan February 15, 2018 19 / 30
67. The Workflow Service
The Workflow Service
Keeps track of all workflows
Organizes the creation of jobs
Organizes the sequence of jobs
Note that this is creation, not execution
The origin point of all work in the system
Greg Logan February 15, 2018 19 / 30
68. So how does this work?
Who calls the workflow service?
You do
Created via the admin UI
Created via ingest
You get a WorkflowInstance
Updating the workflow service takes the job ID!
Greg Logan February 15, 2018 20 / 30
69. What does this look like?
User AdminUI WorkflowService ServiceRegistry
Start
.Start()
.createJob
Greg Logan February 15, 2018 21 / 30
70. Wait, what?
Some of you might have noticed that the previous sequence has problems
It just creates a job, then it stops
Greg Logan February 15, 2018 22 / 30
71. Wait, what?
Some of you might have noticed that the previous sequence has problems
It just creates a job, then it stops
It does not actually do any processing
Greg Logan February 15, 2018 22 / 30
72. Wait, what?
Some of you might have noticed that the previous sequence has problems
It just creates a job, then it stops
It does not actually do any processing
That’s because your workflow is a job
Job type: workflow
Job operation START WORKFLOW
This gets dispatched just like any other job
Greg Logan February 15, 2018 22 / 30
73. What does this look like?
User AdminUI WorkflowService ServiceRegistry
Start
.Start()
.createJob(ST WORKFLOW)
Greg Logan February 15, 2018 23 / 30
74. What does this look like?
ServiceRegistry WorkflowService
.createJob(START WORKFLOW)
.process()
Greg Logan February 15, 2018 24 / 30
75. What does this look like?
ServiceRegistry WorkflowService
.createJob(START WORKFLOW)
.process()
.createJob(START OPERATION)
Greg Logan February 15, 2018 25 / 30
76. But wait, there’s more!
It begins
Everything is a job
It’s jobs all the way down
What is START OPERATION?
Greg Logan February 15, 2018 26 / 30
77. But wait, there’s more!
It begins
Everything is a job
It’s jobs all the way down
What is START OPERATION?
It is a Workflow Job
Greg Logan February 15, 2018 26 / 30
78. We need to go deeper...
ServiceRegistry WorkflowService
.createJob(START WORKFLOW)
.process()
.createJob(START OPERATION)
Greg Logan February 15, 2018 27 / 30
79. We need to go deeper...
ServiceRegistry WorkflowService SomeService
createJob(START WORKFLOW)
.process()
.createJob(START OPERATION)
process
LoopLoop For each workflow step
Greg Logan February 15, 2018 28 / 30
80. And deeper...
ServiceRegistry WorkflowService SomeWOH SomeService
.process()
.createJob()
process
.start()
.foo()
LoopLoop For each workflow step
Greg Logan February 15, 2018 29 / 30
81. Wrapup
This was a long, complex talk
I hope I was clear
Please ask any questions you might have
This was actually simplified, there are at least two layers
missing
Bonus points if you can guess what they are!
Greg Logan February 15, 2018 30 / 30