SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
DECEMBER 14
GLOBAL AI BOOTCAMP IS POWERED BY:
The Power of Auto ML
How does AutoML “Magic” Happen
Thanks to our Sponsors:
• Software Architect @
o 17+ years professional experience
• Microsoft Azure MVP
• External Expert Horizon 2020, Eurostars-Eureka
• External Expert InnoFund Denmark, RIF Cyprus
• Business Interests
o Web Development, SOA, Integration
o IoT, Machine Learning, Computer Intelligence
o Security & Performance Optimization
• Contact
ivelin.andreev@icb.bg
www.linkedin.com/in/ivelin
www.slideshare.net/ivoandreev
About me
Contents
1. Machine Learning Workflow
2. Visual Interface for Azure ML Service
3. Automated ML
4. Advanced ML with Azure Monitor
5. Deep Learning with Tensorflow
6. AI Ops
7. Cognitive Vision Services
8. Insights with Text Analytics and Vision
9. Cognitive Decision Service
10. Cognitive Search Service
11. Version Control for ML
12. VS Code for Python ML
13. Bot Framework
14. Search Bots with Cognitive Services
15. Bot Architecture Best Practices
16. AI and Cognitive Services in Power BI
17. Form Processing with AI Builder
AGENDA
Auto ML
Pipelines
Auto ML Under the Hood
Azure ML Designer
Demo (AutoML Python SDK)
ML is a Process
• Iterative data science process:
o Business problem understanding
o Data collection, cleaning, exploration
o Model building
o Performance evaluation
o Deployment
• Auto ML: Automate environment,
data preparation,
experimentation,
deployment
AutoML is not Auto Data Science
• Any ML Task = {data} + {problem type} + {loss function}
• ML project effort and budget
o 80% data preparation, 15% modeling and evaluation
o Repetitive effort (react to changes in objectives and data)
• AutoML as a tool
o A recommender system for ML pipelines
to achieve accuracy with less time
• Objective
o Offload data scientists from of repetitive tasks
o Automate problem solution on data with minimal loss
AutoML fills the gap
between “supply” and
“demand” on ML market
AutoML outperforms an
average Data Scientist
Auto ML Builds ML Pipelines
User Input: Dataset, Performance goals, Constraints (CPU, RAM, time)
Auto ML Magic
Results: Automatically determine a pipeline structure with minimal loss on the
validation set within CPU/Memory constraints
Auto ML Steps
1. Determine pipeline structure
2. Select algorithm for each step
3. Tune hyper-parameters
Performance Evaluation
• All 3 steps shall be completed;
• Iterate until performance goals reached
ML Pipeline Steps
An ML pipeline is a technical solution to stitch ML phases and automate workflows
• Data
o Select preprocessing strategy (imbalanced and missing data, normalization, outliers)
o Features (feature extraction, engineering, selection)
• Modeling
o Select algorithm
o Tune hyperparameters (i.e. number of trees)
o Train multiple models, create ensemble
o Score, evaluate, select the best model
• Training & Deployment
o Parallel training on a cluster, Maintain versioning
ML Pipeline Benefits
• Advantages of ML Pipelines
o Parallel and unattended execution
o Reusability through pipeline templates for specific scenarios
o Versioning data and results using pipeline SDK
o Modularity separating areas of concern
o Collaboration among data scientists across ML design process
o Scalability – single ML pipeline can be trained on multiple machines;
different ML pipelines can be tested in parallel on many nodes
• Open Issue
How do pipelines “learn” what to do???
“No free lunch” theorem simplified
(David Wolpert, 1996)
1. Model is simplification of reality
2. Simplification is based on bias
3. Bias fails in some situations
Conclusion 1: No algorithm or
parameter set is always the best.
Conclusion 2: Use knowledge
about data and context.
Automated Data Preparation
Step 1: Data Ingestion
• Requires data storage (Azure Blob mounted by default)
• Data quality issues are common (missing data, mixed units and formats)
• Evaluate quality, select initial features (statistical analysis and visualization)
Rule of Thumb: No algorithm could achieve good results with bad data input
Step 2: Data profiling and cleansing
• AutoML provides a variety of statistics to verify dataset is ready for modelling
o Non-numeric (Min, Max, Count)
o Numeric (Mean, StdDev, Variance, Distribution histogram)
• Cleansing cannot be done in GUI
o Python SDK: azureml.dataprep
o ML Turn on “Automtic preprocessing” option
Auto ML Guardrails
What is: Safeguard users against common issues with data and make corrections
Missing Values
• Strategies: Drop rows; intelligently replace missing values based on other data
Class Imbalances
• Most ML algorithms assume equal distribution, majority classes add more bias
• Strategies: Oversampling (add instances to minority class); Undersampling (majority)
Data Leakage
• Dataset includes information that would not be available at time of prediction
• Actual outcome is already known, model performance will be perfect
• Strategies: Remove leaky features; Add noise; Hold back unseen test data
Automated Data Preparation
Step 3: Feature Engineering
• Impute missing values (mode for categorical, mean for numerical)
• Create categorical features from numeric with low diversity
• YYYY, MM, dd, HH, mm, ss, Day of week, Day of year, Quarter, Week Nr from date
• One-hot encode low cardinality categorical vars (i.e. Gender -> IsMale, IsFemale)
• K-means clustering on each numeric columns for distance to centroid feature
• Term frequency for text variables
• Outlier treatment
Note: General-purpose steps are not domain specific (i.e. income/debt ratio)
Automated Data Preparation
Step 3 just got you into a problem 
• Feature engineering could generate too many features
• Solution need to avoid overfitting, reduce model training time
• We did not put domain knowledge
Step 4: Feature Selection (limited in AutoML)
• Drop high cardinality variables (noise)
• Drop no variance variables (non-informative)
Possible future improvements
• Drop highly correlated fields
Algorithm Selection and Hyperparametrization
Challenges of Configuration Space
• High-dimensionality (multiple continuous, categorical, binary variables)
• Conditionality (some parameter values are relevant in combination)
• No Gradient (loss function has no gradient, expensive evaluation)
Opt3: Bayesian OptimizationOpt1: Grid Search / Brute Force
• Cartesian product on hyperparameter combinations
• The simplest method, dimensionality curse
Opt2: Random Search
• Random configurations within certain budget
• Good baseline, no assumptions, easy parallelization
Meta Learning in AutoML
Challenges
• Avoid starting from scratch on new ML tasks
• Learn from experience, efficiently and in systematic data-driven way
Prerequisite
• Collect meta-data to describe previous tasks (parameters, pipeline structure, evaluations)
Result
• Meta-learner to recommend promising configurations w/o exhaustive search
Notes
• If datasets have similar results on few pipelines => similar results on remaining pipelines
• Operates similarly to recommender systems
• Privacy: AML has no need to access customer data, only pipeline results
Cross-Validation and Ensembling
Cross Validation
• Divide training data in k-subsets
• Repeat k-times: hold out ki, validate on k-1 subsets;
• Average error estimation across k error estimations
Ensembling (bagging, boosting, stacking)
• Combine few of best ML models for improved accuracy at no extra cost
Building Azure ML Pipelines
Azure ML Designer vs Azure ML Studio
• ML Studio – collaborative drag-drop workspace to build, test and deploy ML
• Azure ML – designer, SDK and CLI for data prep., train and deploy ML at scale
Azure ML Designer ML Studio (Classic)
Availability Preview (2019) Generally available (GA) (2015)
Drag-drop interface Yes Yes
Scalability With compute target Up to 10GB data limit training
Module rich Important only Multiple
Compute AML computer CPU/GPU Proprietary compute, CPU only
ML Pipeline Authoring, publishing N/A
ML Ops Flexible deployment and versioning Basic management and deploy
Model portability Portable Proprietary, non-portable
Auto ML Through SDK N/A
Azure ML
What is: cloud-based environment to rapidly build and deploy machine learning
models, by auto-scaling powerful CPU or GPU clusters
How to:
1. 4 Development environments for AML – cloud-based notebook VM (easiest);
local (with Azure subscription), Data Science VM and Azure Databricks
2. Create workspace (Python SDK or Azure Portal)
3. azureml.dataprep Python package to explore, cleanse and transform
4. Train target (Local PC, Azure Linux VM, HDInsight for Spark)
5. azureml.train recommend pipeline based on target metrics
6. Register models for tag, search and deploy (even models trained outside AML)
7. Deploy to Azure Container Instance serverless containers
Interpreting Learning Results (Classification)
• Confusion Matrix
o Rows – true class, Columns – predicted class
o Good model = most values along the diagonal
• Precision-Recall Chart
o Precision = TP / (TP + FP), ability to label correctly
o Recall = TP / (TP + FN), ability to find all instances
o Macro Average PR – independent PR average
o Micro Average PR – weighted PR average (imbalanced)
o Draw PR chart - at different threshold values
• ROC Chart – TP Rate / FP Rate over different thresholds
FPR = FP / (FP + TN) (best is close to 0), TPR = TP (TP + FN) (best is close to 1)
Lift, Gain and Calibration Charts
• Lift Chart – How many times the model is better than random
o Ratio of gain%/random expectation% at a given decile level
o Green line – baseline random guess
• Gain Chart – how much to sample to get target sensitivity (TPR)
o X – percentile addressed, Y - portion positive responses
o Green line - baseline random guess
• Calibration Chart
o Confidence of a predictive model
o Predicted vs actual probability
o Good model: y=x
o Overly confident: y=0 and y=1
Note: perfectly calibrated classifier != perfect classifier
Containers meet Machine Learning
• Steps: (from Portal or AML SDK management API)
o Add model (from local workspace or upload model)
o Add driver script
o Add package dependency file (YML)
o The system creates Docker image and register to Workspace
• Deployment
o Azure Container Instance (ACI) - test, Azure Kubernetes Service (AKS) - prod
o Azure ML Compute, Azure IoT Edge
• Operationalization
o REST API is created automatically
Operationalization
• REST APIs
o Deployment an AML model web service creates single and batch REST API
o APIs consumed by azureml.core.webservice
• Performance Degradation
o Performance in real life may differ from during training
o Data drift - change in characteristics of input data over time
• Monitoring and Drift Analysis
o Input data change over time and lead to performance degradation
o Configure inference data to snapshot and profile against baseline
o ML model trained to detect differences
o Model performance converted to drift coefficient
Takeaways
• Books
o AI MVP Book: Automated Machine Learning
https://www.amazon.com/gp/aw/d/B082P5MK8Y
o Practical Automated ML on Azure
• The No Free Lunch Theorem
https://www.kdnuggets.com/2019/09/no-free-lunch-data-science.html
• Azure ML Studio vs Azure ML Services designer
https://www.codit.eu/blog/azure-machine-learning-studio-vs-services/
https://docs.microsoft.com/en-us/azure/machine-learning/compare-azure-ml-to-
studio-classic
• Bayes Theorem
https://towardsdatascience.com/understanding-bayes-theorem-7e31b8434d4b
Azure ML StudioAure ML Service
Thanks to our Sponsors:

Más contenido relacionado

La actualidad más candente

MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 

La actualidad más candente (20)

MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at Scale
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 

Similar a The Power of Auto ML and How Does it Work

Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 

Similar a The Power of Auto ML and How Does it Work (20)

The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 

Más de Ivo Andreev

Más de Ivo Andreev (20)

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
 
Global azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure LighthouseGlobal azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure Lighthouse
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Industrial IoT on Azure
Industrial IoT on AzureIndustrial IoT on Azure
Industrial IoT on Azure
 
Flying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer VisionFlying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer Vision
 
ML with Power BI for Business and Pros
ML with Power BI for Business and ProsML with Power BI for Business and Pros
ML with Power BI for Business and Pros
 

Último

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Último (20)

WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 

The Power of Auto ML and How Does it Work

  • 1. DECEMBER 14 GLOBAL AI BOOTCAMP IS POWERED BY: The Power of Auto ML How does AutoML “Magic” Happen
  • 2. Thanks to our Sponsors:
  • 3. • Software Architect @ o 17+ years professional experience • Microsoft Azure MVP • External Expert Horizon 2020, Eurostars-Eureka • External Expert InnoFund Denmark, RIF Cyprus • Business Interests o Web Development, SOA, Integration o IoT, Machine Learning, Computer Intelligence o Security & Performance Optimization • Contact ivelin.andreev@icb.bg www.linkedin.com/in/ivelin www.slideshare.net/ivoandreev About me
  • 4. Contents 1. Machine Learning Workflow 2. Visual Interface for Azure ML Service 3. Automated ML 4. Advanced ML with Azure Monitor 5. Deep Learning with Tensorflow 6. AI Ops 7. Cognitive Vision Services 8. Insights with Text Analytics and Vision 9. Cognitive Decision Service 10. Cognitive Search Service 11. Version Control for ML 12. VS Code for Python ML 13. Bot Framework 14. Search Bots with Cognitive Services 15. Bot Architecture Best Practices 16. AI and Cognitive Services in Power BI 17. Form Processing with AI Builder
  • 5. AGENDA Auto ML Pipelines Auto ML Under the Hood Azure ML Designer Demo (AutoML Python SDK)
  • 6. ML is a Process • Iterative data science process: o Business problem understanding o Data collection, cleaning, exploration o Model building o Performance evaluation o Deployment • Auto ML: Automate environment, data preparation, experimentation, deployment
  • 7. AutoML is not Auto Data Science • Any ML Task = {data} + {problem type} + {loss function} • ML project effort and budget o 80% data preparation, 15% modeling and evaluation o Repetitive effort (react to changes in objectives and data) • AutoML as a tool o A recommender system for ML pipelines to achieve accuracy with less time • Objective o Offload data scientists from of repetitive tasks o Automate problem solution on data with minimal loss
  • 8. AutoML fills the gap between “supply” and “demand” on ML market AutoML outperforms an average Data Scientist
  • 9. Auto ML Builds ML Pipelines User Input: Dataset, Performance goals, Constraints (CPU, RAM, time) Auto ML Magic Results: Automatically determine a pipeline structure with minimal loss on the validation set within CPU/Memory constraints Auto ML Steps 1. Determine pipeline structure 2. Select algorithm for each step 3. Tune hyper-parameters Performance Evaluation • All 3 steps shall be completed; • Iterate until performance goals reached
  • 10. ML Pipeline Steps An ML pipeline is a technical solution to stitch ML phases and automate workflows • Data o Select preprocessing strategy (imbalanced and missing data, normalization, outliers) o Features (feature extraction, engineering, selection) • Modeling o Select algorithm o Tune hyperparameters (i.e. number of trees) o Train multiple models, create ensemble o Score, evaluate, select the best model • Training & Deployment o Parallel training on a cluster, Maintain versioning
  • 11. ML Pipeline Benefits • Advantages of ML Pipelines o Parallel and unattended execution o Reusability through pipeline templates for specific scenarios o Versioning data and results using pipeline SDK o Modularity separating areas of concern o Collaboration among data scientists across ML design process o Scalability – single ML pipeline can be trained on multiple machines; different ML pipelines can be tested in parallel on many nodes • Open Issue How do pipelines “learn” what to do???
  • 12. “No free lunch” theorem simplified (David Wolpert, 1996) 1. Model is simplification of reality 2. Simplification is based on bias 3. Bias fails in some situations Conclusion 1: No algorithm or parameter set is always the best. Conclusion 2: Use knowledge about data and context.
  • 13. Automated Data Preparation Step 1: Data Ingestion • Requires data storage (Azure Blob mounted by default) • Data quality issues are common (missing data, mixed units and formats) • Evaluate quality, select initial features (statistical analysis and visualization) Rule of Thumb: No algorithm could achieve good results with bad data input Step 2: Data profiling and cleansing • AutoML provides a variety of statistics to verify dataset is ready for modelling o Non-numeric (Min, Max, Count) o Numeric (Mean, StdDev, Variance, Distribution histogram) • Cleansing cannot be done in GUI o Python SDK: azureml.dataprep o ML Turn on “Automtic preprocessing” option
  • 14. Auto ML Guardrails What is: Safeguard users against common issues with data and make corrections Missing Values • Strategies: Drop rows; intelligently replace missing values based on other data Class Imbalances • Most ML algorithms assume equal distribution, majority classes add more bias • Strategies: Oversampling (add instances to minority class); Undersampling (majority) Data Leakage • Dataset includes information that would not be available at time of prediction • Actual outcome is already known, model performance will be perfect • Strategies: Remove leaky features; Add noise; Hold back unseen test data
  • 15. Automated Data Preparation Step 3: Feature Engineering • Impute missing values (mode for categorical, mean for numerical) • Create categorical features from numeric with low diversity • YYYY, MM, dd, HH, mm, ss, Day of week, Day of year, Quarter, Week Nr from date • One-hot encode low cardinality categorical vars (i.e. Gender -> IsMale, IsFemale) • K-means clustering on each numeric columns for distance to centroid feature • Term frequency for text variables • Outlier treatment Note: General-purpose steps are not domain specific (i.e. income/debt ratio)
  • 16. Automated Data Preparation Step 3 just got you into a problem  • Feature engineering could generate too many features • Solution need to avoid overfitting, reduce model training time • We did not put domain knowledge Step 4: Feature Selection (limited in AutoML) • Drop high cardinality variables (noise) • Drop no variance variables (non-informative) Possible future improvements • Drop highly correlated fields
  • 17. Algorithm Selection and Hyperparametrization Challenges of Configuration Space • High-dimensionality (multiple continuous, categorical, binary variables) • Conditionality (some parameter values are relevant in combination) • No Gradient (loss function has no gradient, expensive evaluation) Opt3: Bayesian OptimizationOpt1: Grid Search / Brute Force • Cartesian product on hyperparameter combinations • The simplest method, dimensionality curse Opt2: Random Search • Random configurations within certain budget • Good baseline, no assumptions, easy parallelization
  • 18. Meta Learning in AutoML Challenges • Avoid starting from scratch on new ML tasks • Learn from experience, efficiently and in systematic data-driven way Prerequisite • Collect meta-data to describe previous tasks (parameters, pipeline structure, evaluations) Result • Meta-learner to recommend promising configurations w/o exhaustive search Notes • If datasets have similar results on few pipelines => similar results on remaining pipelines • Operates similarly to recommender systems • Privacy: AML has no need to access customer data, only pipeline results
  • 19. Cross-Validation and Ensembling Cross Validation • Divide training data in k-subsets • Repeat k-times: hold out ki, validate on k-1 subsets; • Average error estimation across k error estimations Ensembling (bagging, boosting, stacking) • Combine few of best ML models for improved accuracy at no extra cost
  • 20. Building Azure ML Pipelines
  • 21. Azure ML Designer vs Azure ML Studio • ML Studio – collaborative drag-drop workspace to build, test and deploy ML • Azure ML – designer, SDK and CLI for data prep., train and deploy ML at scale Azure ML Designer ML Studio (Classic) Availability Preview (2019) Generally available (GA) (2015) Drag-drop interface Yes Yes Scalability With compute target Up to 10GB data limit training Module rich Important only Multiple Compute AML computer CPU/GPU Proprietary compute, CPU only ML Pipeline Authoring, publishing N/A ML Ops Flexible deployment and versioning Basic management and deploy Model portability Portable Proprietary, non-portable Auto ML Through SDK N/A
  • 22. Azure ML What is: cloud-based environment to rapidly build and deploy machine learning models, by auto-scaling powerful CPU or GPU clusters How to: 1. 4 Development environments for AML – cloud-based notebook VM (easiest); local (with Azure subscription), Data Science VM and Azure Databricks 2. Create workspace (Python SDK or Azure Portal) 3. azureml.dataprep Python package to explore, cleanse and transform 4. Train target (Local PC, Azure Linux VM, HDInsight for Spark) 5. azureml.train recommend pipeline based on target metrics 6. Register models for tag, search and deploy (even models trained outside AML) 7. Deploy to Azure Container Instance serverless containers
  • 23. Interpreting Learning Results (Classification) • Confusion Matrix o Rows – true class, Columns – predicted class o Good model = most values along the diagonal • Precision-Recall Chart o Precision = TP / (TP + FP), ability to label correctly o Recall = TP / (TP + FN), ability to find all instances o Macro Average PR – independent PR average o Micro Average PR – weighted PR average (imbalanced) o Draw PR chart - at different threshold values • ROC Chart – TP Rate / FP Rate over different thresholds FPR = FP / (FP + TN) (best is close to 0), TPR = TP (TP + FN) (best is close to 1)
  • 24. Lift, Gain and Calibration Charts • Lift Chart – How many times the model is better than random o Ratio of gain%/random expectation% at a given decile level o Green line – baseline random guess • Gain Chart – how much to sample to get target sensitivity (TPR) o X – percentile addressed, Y - portion positive responses o Green line - baseline random guess • Calibration Chart o Confidence of a predictive model o Predicted vs actual probability o Good model: y=x o Overly confident: y=0 and y=1 Note: perfectly calibrated classifier != perfect classifier
  • 25. Containers meet Machine Learning • Steps: (from Portal or AML SDK management API) o Add model (from local workspace or upload model) o Add driver script o Add package dependency file (YML) o The system creates Docker image and register to Workspace • Deployment o Azure Container Instance (ACI) - test, Azure Kubernetes Service (AKS) - prod o Azure ML Compute, Azure IoT Edge • Operationalization o REST API is created automatically
  • 26. Operationalization • REST APIs o Deployment an AML model web service creates single and batch REST API o APIs consumed by azureml.core.webservice • Performance Degradation o Performance in real life may differ from during training o Data drift - change in characteristics of input data over time • Monitoring and Drift Analysis o Input data change over time and lead to performance degradation o Configure inference data to snapshot and profile against baseline o ML model trained to detect differences o Model performance converted to drift coefficient
  • 27. Takeaways • Books o AI MVP Book: Automated Machine Learning https://www.amazon.com/gp/aw/d/B082P5MK8Y o Practical Automated ML on Azure • The No Free Lunch Theorem https://www.kdnuggets.com/2019/09/no-free-lunch-data-science.html • Azure ML Studio vs Azure ML Services designer https://www.codit.eu/blog/azure-machine-learning-studio-vs-services/ https://docs.microsoft.com/en-us/azure/machine-learning/compare-azure-ml-to- studio-classic • Bayes Theorem https://towardsdatascience.com/understanding-bayes-theorem-7e31b8434d4b
  • 28. Azure ML StudioAure ML Service
  • 29. Thanks to our Sponsors: