This document discusses problems with client-side machine learning automation and proposes solutions using server-side workflows defined as RESTful resources and a domain-specific language (DSL). The DSL allows defining reusable ML workflows, executing workflows on a server, and easily parallelizing workflows for multiple resources through syntactic abstraction and language interoperability features.
3. Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at
the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated
as parts of bigger client–side workflows
Hard to audit Client–side development environments are
complex and very hard to sandbox
Not enough automation
4. Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at
the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated
as parts of bigger client–side workflows
Hard to audit Client–side development environments are
complex and very hard to sandbox
Not enough abstraction
5. Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize Declarative client tools hide complexity at
the cost of flexibility
Hard to combine Black–box tools cannot be easily integrated
as parts of bigger client–side workflows
Hard to audit Client–side development environments are
complex and very hard to sandbox
Algorithmic complexity and computing resources management
problems mostly washed away are back!
11. In a Nutshell
1. Workflows reified as server–side, RESTful resources
2. Domain–specific language for ML workflow automation
12. Workflows as RESTful Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be
imported by other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code
used by the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by
the script and returned to the user.
Execution Given a script and a complete set of
inputs, the workflow can be executed
and its outputs generated.
13. Ways to create WhizzML Scripts and Libraries
Github
Script editor
Gallery
Other scripts
Scriptify
−→
14. Syntactic Abstraction in WhizzML: Simple workflow
;; ML artifacts are first-class citizens,
;; we only need to talk about our domain
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id
model-id
{"name" "Evaluation 80/20"
"missing_strategy" 0}))
15. Language Interoperability in WhizzML
from bigml.api import BigML
api = BigML()
# choose workflow
script = 'script/567b4b5be3f2a123a690ff56'
# define parameters
inputs = {'source': 'source/5643d345f43a234ff2310a3e'}
# execute
api.ok(api.create_execution(script, inputs))
16. Metaprogramming in reflective DSLs: Scriptify
Resources that create
resources that create
resources that create
resources that create
resources that create
resources that create
. . .
18. Domain Specificity and Scalability: Trivial
parallelization
;; Workflow for 1 resource
(let ([train-id test-id] (create-dataset-split id 0.8)
model-id (create-model train-id))
(create-evaluation test-id model-id))
19. Domain Specificity and Scalability: Trivial
parallelization
;; Workflow for arbitrary number of resources
(let (splits (for (id input-datasets)
(create-dataset-split id 0.8)))
(for (s splits)
(create-evaluation (s 1) (create-model (s 0)))))