"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."
In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Automatic machine learning (AutoML) 101
1. AutoML 101
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.quantuniversity.com
10/25/2018
QuantUniversity Meetup
Boston
2. 2
About us:
• Data Science, Quant Finance and
Model Governance Advisory
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform
4. 4
• Your challenge is to design an artificial intelligence and machine
learning (AI/ML) framework capable of flying a drone through
several professional drone racing courses without human
intervention or navigational pre-programming.
AlphaPilot Drone AI Challenge
7. 7
• “AI is the theory and development of computer systems able to
perform tasks that traditionally have required human intelligence.
• AI is a broad field, of which ‘machine learning’ is a sub-category”
What is Machine Learning and AI?
Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
8. 8
The Machine Learning Process
Data
cleansing
Feature
Engineering
Training and
Testing
Model
building
Model
selection
Hyper
parameter
optimization
Model
Deployment
9. 9
• Supervised Algorithms
▫ Given a set of variables !", predict the value of another variable # in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y
10. 10
• Unsupervised Algorithms
▫ Given a dataset with variables !", build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
12. 12
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
! = #$ + #&'&
Linear Regression Model Neural network Model
13. 13
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest neighbors, Decision Trees
Supervised Learning models
K-nearest neighbor Model Decision tree Model
15. 15
• Automated machine learning (AutoML) is the process of
automating the end-to-end process of applying machine learning to
real-world problems.
AutoML
16. 16
• Automated Feature Engineering
▫ Feature selection
▫ Feature extraction
▫ Meta learning and transfer learning
▫ Detection and handling of skewed data and/or missing values
• Hyper-parameter optimization
• Model Selection
• Reference:
https://en.wikipedia.org/wiki/Automated_machine_learning
Types of frameworks
17. 17
• Parameters: Values that can be estimated from data
▫ Examples:
– Regression Coefficients
– Weights in a Neural Network
• HyperParameters: Values external to the model and cannot be
learnt from the data
▫ Examples:
– Learning rate in Neural Network
– Regularization parameters
Parameters vs Hyper Parameters
18. 18
• Hyperparameter optimization finds a tuple of hyperparameters that yields an
optimal model which minimizes a predefined loss function on given
independent data.[1]
• [1] Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine
Learning".
• Image from:
https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf
Hyperparameter optimization
19. 19
• Interpretability: Ability of users to understand the model, the
parameters of the model and their effect on the outcome
• Example:
▫ In regression, coefficients enable us to interpret the influence of an
independent variable on the dependent variable.
▫ The standard error of estimates of the coefficients enable us to
determine how confident are we on these estimates
Model selection considerations
20. 20
• Parsimonious models: A parsimonious model is a model that
accomplishes a desired level of explanation or prediction with as
few predictor variables as possible.
• Example:
▫ In regression, using Exhaustive search, Forward search, Backward
search or Stepwise regression in model selection
▫ Using PCA on the feature space prior to model building
Model selection considerations
21. 21
• Ensemble models: Ensemble methods use multiple learning
algorithms to obtain better predictive performance than could be
obtained from any of the constituent learning algorithms alone.
Image from:
https://blogs.sas.com/content/subconsciousmusings/2017/05/18/sta
cked-ensemble-models-win-data-science-competitions/
Model selection considerations
22. 22
Full pipeline Auotmation
• AutoWEKA is an approach for the simultaneous selection of a machine
learning algorithm and its hyperparameters; combined with
the WEKA package it automatically yields good models for a wide variety
of data sets.
• Auto-sklearn is an extension of AutoWEKA using the Python library scikit-
learn which is a drop-in replacement for regular scikit-learn classifiers and
regressors. It improves over AutoWEKA by using meta-learning to
increase search efficiency and post-hoc ensemble building to combine the
models generated during the hyperparameter optimization process.
• TPOT is a data-science assistant which optimizes machine learning
pipelines using genetic programming.
Ref: https://www.ml4aad.org/automl/
Frameworks
23. 23
Hyper-parameter optimization and Model Selection
• H2O AutoML provides automated model selection and ensembling
for the H2O machine learning and data analytics platform.
• mlr is a R package that contains several hyperparameter
optimization techniques for machine learning problems.
Ref: https://www.ml4aad.org/automl/
Frameworks
24. 24
Deep Neural Network Architecture search
• Google CLOUD AUTOML is an could-based machine learning service
which so far provides the automated generation of computer vision
pipelines.
• Auto Keras is an open-source python package for neural architecture
search.
• Ref:
▫ https://www.ml4aad.org/automl/
▫ https://en.wikipedia.org/wiki/Automated_machine_learning
Frameworks
30. Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.QuantUniversity.com
www.analyticscertificate.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
30
31. • Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
31