Valencian Summer School in Machine Learning 2017 - Day 2
Lecture 7: REST API, Bindings, and Basic Workflows. By jao - Jose A. Ortega - (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017
1. Automating Machine Learning
API, bindings, BigMLer and Basic Workflows
#VSSML17
September 2017
#VSSML17 Automating Machine Learning September 2017 1 / 56
2. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 2 / 56
3. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 3 / 56
4. Machine Learning as a System Service
The goal
Machine Learning as a system
level service
The means
• APIs: ML building blocks
• Abstraction layer over feature
engineering
• Abstraction layer over
algorithms
• Automation
#VSSML17 Automating Machine Learning September 2017 4 / 56
6. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 6 / 56
10. RESTful-ish ML Services
• Excellent abstraction layer
• Transparent data model
• Immutable resources and UUIDs: traceability
• Simple yet effective interaction model
• Easy access from any language (API bindings)
Algorithmic complexity and computing resources
management problems mostly washed away
#VSSML17 Automating Machine Learning September 2017 10 / 56
11. RESTful done right: Whitebox resources
• Your data, your model
• Model reverse engineering becomes
moot
• Maximizes reach (Web, CLI, desktop,
IoT)
#VSSML17 Automating Machine Learning September 2017 11 / 56
12. Example workflow: Batch Centroid
Objective: Label each row in a Dataset with its associated centroid.
We need to...
• Create Dataset
• Create Cluster
• Create BatchCentroid from Cluster
and Dataset
• Save BatchCentroid as new Dataset
#VSSML17 Automating Machine Learning September 2017 12 / 56
13. Example workflow: building blocks
curl -X POST "https://bigml.io?$AUTH/dataset"
-D '{"source": "source/56fbbfea200d5a3403000db7"}'
curl -X POST "https://bigml.io?$AUTH/cluster"
-D '{"source": "dataset/43ffe231a34fff333000b65"}'
curl -X POST "https://bigml.io?$AUTH/batchcentroid"
-D '{"dataset": "dataset/43ffe231a34fff333000b65",
"cluster": "cluster/33e2e231a34fff333000b65"}'
curl -X GET "https://bigml.io?$AUTH/dataset/1234ff45eab8c0034334"
#VSSML17 Automating Machine Learning September 2017 13 / 56
14. Example workflow: Web UI
#VSSML17 Automating Machine Learning September 2017 14 / 56
17. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 17 / 56
19. Example workflow: Python bindings
from bigml.api import BigML
api = BigML()
source = 'source/5643d345f43a234ff2310a3e'
# create dataset and cluster, waiting for both
dataset = api.create_dataset(source)
api.ok(dataset)
cluster = api.create_cluster(dataset)
api.ok(cluster)
# create new dataset with centroid
new_dataset = api.create_batch_centroid(cluster, dataset,
{'output_dataset': True,
'all_fields': True})
# wait again, via polling, until the job is finished
api.ok(new_dataset)
#VSSML17 Automating Machine Learning September 2017 19 / 56
20. Client-side automation via bindings
Strengths of bindings-based solutions
Versatility Maximum flexibility and possibility of encapsulation (via
proper engineering)
Native Easy to support any programming language
Offline Whitebox models allow local use of resources (e.g.,
real-time predictions)
#VSSML17 Automating Machine Learning September 2017 20 / 56
21. Client-side automation via bindings
Strengths of bindings-based solutions
from bigml.model import Model
model_id = 'model/5643d345f43a234ff2310a3e'
# Download of (whitebox) resource
local_model = Model(model_id)
# Purely local calculations
local_model.predict({'plasma glucose': 132})
#VSSML17 Automating Machine Learning September 2017 21 / 56
22. Client-side automation via bindings
Problems of bindings-based solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows are hard to optimize
Not enough abstraction
#VSSML17 Automating Machine Learning September 2017 22 / 56
23. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 23 / 56
29. Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize CLI tools like bigmler hide complexity at the cost of
flexibility
#VSSML17 Automating Machine Learning September 2017 29 / 56
30. Client-side Machine Learning Automation
Problems of client-side solutions
Complex Too fine-grained, leaky abstractions
Cumbersome Error handling, network issues
Hard to reuse Tied to a single programming language
Hard to scale Parallelization again a problem
Hard to generalize CLI tools like bigmler hide complexity at the cost of
flexibility
Algorithmic complexity and computing resources management
problems mostly washed away are back!
#VSSML17 Automating Machine Learning September 2017 29 / 56
31. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 30 / 56
32. Client-side Machine Learning Automation
Problems of client-side solutions
Complexity Lots of details outside the problem domain
Reuse No inter-language compatibility
Scalability Client-side workflows hard to optimize
Extensibility Bigmler hides complexity at the cost of flexibility
Not enough abstraction
#VSSML17 Automating Machine Learning September 2017 31 / 56
35. Basic workflows in WhizzML: automatic generation
#VSSML17 Automating Machine Learning September 2017 34 / 56
36. Server-side Machine Learning Automation
Solution (complexity, reuse): Domain-specific languages
#VSSML17 Automating Machine Learning September 2017 35 / 56
37. WhizzML in a Nutshell
• Domain-specific language for ML workflow automation
High-level problem and solution specification
• Framework for scalable, remote execution of ML workflows
Sophisticated server-side optimization
Out-of-the-box scalability
Client-server brittleness removed
Infrastructure for creating and sharing ML scripts and libraries
#VSSML17 Automating Machine Learning September 2017 36 / 56
38. WhizzML REST Resources
Library Reusable building-block: a collection of
WhizzML definitions that can be imported by
other libraries or scripts.
Script Executable code that describes an actual
workflow.
• Imports List of libraries with code used by
the script.
• Inputs List of input values that
parameterize the workflow.
• Outputs List of values computed by the
script and returned to the user.
Execution Given a script and a complete set of inputs,
the workflow can be executed and its outputs
generated.
#VSSML17 Automating Machine Learning September 2017 37 / 56
39. Different ways to create WhizzML Scripts/Libraries
Github
Script editor
Gallery
Other scripts
Scriptify
−→
#VSSML17 Automating Machine Learning September 2017 38 / 56
45. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 44 / 56
46. Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble
#VSSML17 Automating Machine Learning September 2017 45 / 56
47. Model or Ensemble?
;; Functions for creating the two dataset parts
;; Sample a dataset taking a fraction of its rows (rate) and
;; keeping either that fraction (out-of-bag? false) or its
;; complement (out-of-bag? true)
(define (sample-dataset origin-id rate out-of-bag?)
(create-dataset {"origin_dataset" origin-id
"sample_rate" rate
"out_of_bag" out-of-bag?
"seed" "example-seed-0001"})))
;; Create in parallel two halves of a dataset using
;; the sample function twice. Return a list of the two
;; new dataset ids.
(define (split-dataset origin-id rate)
(list (sample-dataset origin-id rate false)
(sample-dataset origin-id rate true)))
#VSSML17 Automating Machine Learning September 2017 46 / 56
48. Model or Ensemble?
;; Functions to create an ensemble and extract the f-measure from
;; evaluation, given its id.
(define (make-ensemble ds-id size)
(create-ensemble ds-id {"number_of_models" size}))
(define (f-measure ev-id)
(let (ev-id (wait ev-id) ;; because fetch doesn't wait
evaluation (fetch ev-id))
(evaluation ["result" "model" "average_f_measure"]))
#VSSML17 Automating Machine Learning September 2017 47 / 56
49. Model or Ensemble?
;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-dataset {"source" src-id})
[train-id test-id] (split-dataset ds-id 0.8)
m-id (create-model train-id)
e-id (make-ensemble train-id 15)
m-f (f-measure (create-evaluation m-id test-id))
e-f (f-measure (create-evaluation e-id test-id)))
(log-info "model f " m-f " / ensemble f " e-f)
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))
#VSSML17 Automating Machine Learning September 2017 48 / 56
50. Outline
1 Machine Learning workflows
2 ML as a RESTful Cloudy Service
3 Client-side workflows: REST API and bindings
4 Client-side workflows: Bigmler
5 Server-side workflows: WhizzML
6 Example Workflow: Model or Ensemble?
7 Case study: Using Flatline in Whizzml
#VSSML17 Automating Machine Learning September 2017 49 / 56
51. Transforming item counts to features
basket milk eggs flour salt chocolate caviar
milk,eggs Y Y N N N N
milk,flour Y N Y N N N
milk,flour,eggs Y Y Y N N N
chocolate N N N N Y N
#VSSML17 Automating Machine Learning September 2017 50 / 56