Automatic machine learning (AutoML) 101

AutoML 101
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.quantuniversity.com
10/25/2018
QuantUniversity Meetup
Boston

2
About us:
• Data Science, Quant Finance and
Model Governance Advisory
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform

3
www.analyticscertificate.com/MachineLearning
Use code “Affiliate” for a 20% off by Oct 30th
Upcoming workshop
November 7,8,2018

4
• Your challenge is to design an artificial intelligence and machine
learning (AI/ML) framework capable of flying a drone through
several professional drone racing courses without human
intervention or navigational pre-programming.
AlphaPilot Drone AI Challenge

6
• Machine Learning
• Automatic Machine Learning
• Demos
Agenda

7
• “AI is the theory and development of computer systems able to
perform tasks that traditionally have required human intelligence.
• AI is a broad field, of which ‘machine learning’ is a sub-category”
What is Machine Learning and AI?
Source: http://www.fsb.org/wp-content/uploads/P011117.pdf

8
The Machine Learning Process
Data
cleansing
Feature
Engineering
Training and
Testing
Model
building
Model
selection
Hyper
parameter
optimization
Model
Deployment

9
• Supervised Algorithms
▫ Given a set of variables !", predict the value of another variable # in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y

10
• Unsupervised Algorithms
▫ Given a dataset with variables !", build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1

11
Supervised
Learning
algorithms
Parametric
models
Non-
Parametric
models
Supervised learning Algorithms - Prediction

12
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
! = #$ + #&'&
Linear Regression Model Neural network Model

13
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest neighbors, Decision Trees
Supervised Learning models
K-nearest neighbor Model Decision tree Model

15
• Automated machine learning (AutoML) is the process of
automating the end-to-end process of applying machine learning to
real-world problems.
AutoML

16
• Automated Feature Engineering
▫ Feature selection
▫ Feature extraction
▫ Meta learning and transfer learning
▫ Detection and handling of skewed data and/or missing values
• Hyper-parameter optimization
• Model Selection
• Reference:
https://en.wikipedia.org/wiki/Automated_machine_learning
Types of frameworks

17
• Parameters: Values that can be estimated from data
▫ Examples:
– Regression Coefficients
– Weights in a Neural Network
• HyperParameters: Values external to the model and cannot be
learnt from the data
▫ Examples:
– Learning rate in Neural Network
– Regularization parameters
Parameters vs Hyper Parameters

18
• Hyperparameter optimization finds a tuple of hyperparameters that yields an
optimal model which minimizes a predefined loss function on given
independent data.[1]
• [1] Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine
Learning".
• Image from:
https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf
Hyperparameter optimization

19
• Interpretability: Ability of users to understand the model, the
parameters of the model and their effect on the outcome
• Example:
▫ In regression, coefficients enable us to interpret the influence of an
independent variable on the dependent variable.
▫ The standard error of estimates of the coefficients enable us to
determine how confident are we on these estimates
Model selection considerations

20
• Parsimonious models: A parsimonious model is a model that
accomplishes a desired level of explanation or prediction with as
few predictor variables as possible.
• Example:
▫ In regression, using Exhaustive search, Forward search, Backward
search or Stepwise regression in model selection
▫ Using PCA on the feature space prior to model building

21
• Ensemble models: Ensemble methods use multiple learning
algorithms to obtain better predictive performance than could be
obtained from any of the constituent learning algorithms alone.
Image from:
https://blogs.sas.com/content/subconsciousmusings/2017/05/18/sta
cked-ensemble-models-win-data-science-competitions/

22
Full pipeline Auotmation
• AutoWEKA is an approach for the simultaneous selection of a machine
learning algorithm and its hyperparameters; combined with
the WEKA package it automatically yields good models for a wide variety
of data sets.
• Auto-sklearn is an extension of AutoWEKA using the Python library scikit-
learn which is a drop-in replacement for regular scikit-learn classifiers and
regressors. It improves over AutoWEKA by using meta-learning to
increase search efficiency and post-hoc ensemble building to combine the
models generated during the hyperparameter optimization process.
• TPOT is a data-science assistant which optimizes machine learning
pipelines using genetic programming.
Ref: https://www.ml4aad.org/automl/
Frameworks

23
Hyper-parameter optimization and Model Selection
• H2O AutoML provides automated model selection and ensembling
for the H2O machine learning and data analytics platform.
• mlr is a R package that contains several hyperparameter
optimization techniques for machine learning problems.
Ref: https://www.ml4aad.org/automl/
Frameworks

24
Deep Neural Network Architecture search
• Google CLOUD AUTOML is an could-based machine learning service
which so far provides the automated generation of computer vision
pipelines.
• Auto Keras is an open-source python package for neural architecture
search.
• Ref:
▫ https://www.ml4aad.org/automl/
▫ https://en.wikipedia.org/wiki/Automated_machine_learning
Frameworks

26
Hardware Considerations
Reference: https://azure.microsoft.com/en-us/blog/release-
models-at-pace-using-microsoft-s-automl/

27
So, which one to choose?
Let’s try some of them

28
www.QuSandbox.com
Model
Analytics
Studio
QuResearchHub
QuSandbox
Prototype, Iterate and tune Standardize workflows
Productionize and share

29
www.analyticscertificate.com/MachineLearning
Use code “Affiliate” for a 20% off by Oct 30th
Continue your learning here!
November 7,8,2018

Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.QuantUniversity.com
www.analyticscertificate.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
30

• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
31

Automatic machine learning (AutoML) 101

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Automatic machine learning (AutoML) 101

Similar a Automatic machine learning (AutoML) 101 (20)

Más de QuantUniversity

Más de QuantUniversity (20)

Último

Último (20)

Automatic machine learning (AutoML) 101