Sample Codes: https://github.com/davegautam/dotnetconfsamplecodes
Presentation on How you can get started with ML.NET. If you are existing .NET Stack Developer and Wanna use the same technology into Machine Learning, this slide focuses on how you can use ML.NET for Machine Learning.
2. I love writing reusable components, solving technical problems for
team & Designing Architecture of solutions along with managing
projects in scrum.
Senior
Project
Manager
Braindigit IT Solutions
Application
& Database
Specialist
Nutrition Innovation Lab (USAID Project)
Senior
Software
Engineer
Bitscrafters INC
Software
Engineer
Softech Infosys
Education MSC IT (Data Science)
Masters in Business Administration
Bachelors of Computer Application
5. Have a Problem That Needs ML?
• Classify Given Input into A or B or C
• Classification & Multi-Class Classification
• Which Marketing Campaign Brought More Customers : Win Gold or Win
Lunch Coupon
• Anomaly detection flags unexpected or unusual events or behaviors
• Fraud Detection of Credit Card
• Mail Spam Detection
• Make Numerical Predictions
• Predict Sales of Next Quarter
• Predict Whiskey Sales Looking at Temperature
6. Have a Problem That Needs ML?
• Understand the Structure of Data
• Clustering Algorithms
• Which Age Group Like Same Type Series
• Learn From Outcome & Decide on Other Actions
• Self Driving Car: At a yellow light, brake or accelerate?
7. Do you have the Data?
• Relevancy of Your Data Against your Problem.
• You Need to Find Life Expectancy%, You have series of data about their
expenses in buying Vegetables. (Irrelevant)
• You Need to Find Life Expectancy%, You have series of data about their
expenses in buying Vegetables & also you know if they were likely TOXIC.
(Relevant)
• Do you Have Enough Data?
• It Depends; No One Can Tell You
• MORE DATA ALWAYS BETTER
• You can determine Enough data using, Statistical Heuristic, Dataset Size vs
Model Skill, Domain Expertise, Analogy & Domain Expertise
8. Do you have the Data?
• Accuracy of the Data
• Highly Concentrated Wrong Data is Problem
• Accuracy of Data is itself a first ask since Machine Learning are Predictions,
and Predictions with wrong data would be Highly Incorrect.
• Is your Data Connected?
• Significant Amount Of Missing Data would hamper your ML
• You Should have Connected Data.
9. Introduction To ML.NET
Microsoft recently open sourced its machine learning
framework that is available on GitHub. ML.NET is an open
source cross-platform for machine learning for .NET
framework.
10. Introduction To ML.NET
• Originally developed in Microsoft, used in Windows, Bing, Azure, and
more
• The idea is to help .NET developers get in on cutting-edge ML
programming without having to learn the underlying technical
details associated with creating and tuning machine learning
models.
• Cross Platform- runs on any platform where 64 bit .NET Core or later is available
• Open Source
• Licensed Under MIT can be found in GitHub.
11. ML.NET Capabilities
ML Tasks
• Classification (e.g. text categorization and
sentiment analysis)
• Regression (e.g. forecasting and price
prediction)
• Clustering
Training Models
• .NET APIs for training models, using
models for predictions
• Core components of this framework,
such as learning algorithms, transforms,
and core ML data structures
Extensions or Integration
• Integration with Python
13. Future RoadMap
• Additional ML Tasks and Scenarios
• Deep Learning with TensorFlow (Already Integrated in ML.NET 0.5) &
CNTK
• ONNX support
• Scale-out on Azure
• Better GUI to simplify ML tasks
• Integration with VS Tools for AI
• Language Innovation for .NET
• CNTK, Accord.NET, TensorFlow integration with one single API
14. Let’s Apply Some
Machine Learning
• Ask Question That has an Answer (Exact Answer)
• Apply the Right Method to Find the Answer
15. Problem: Taxi Fare Prediction
• Problem
• Predicting the fare of a taxi trip in New York City
• Statistical Inferencing
• regression analysis is a set of statistical processes for estimating the
relationships among variables. WIKI.
• (y=ax+b), many techniques for modeling and analyzing several variables,
when the focus is on the relationship between a dependent variable and one
or more independent variables (or 'predictors').
16. Solution: Taxi Fare Prediction
Load & Transform
Data
•Pipeline-Workflow to Train Your Data
•TextLoader- Load CSV Data
•ColumnCopier- Predict Values are Copied to Specified Column
•CategoricalOneHotVectorizer- transform the categorical data into numeric values
•ColumnConcatenator-combines all of the feature columns into the Features
Choosing Learning
Algorithm &
Training Model
• Regression- FastTreeRegressor learner utilizes gradient boosting. Gradient boosting is a machine learning technique for
regression problems. It builds each regression tree in a step-wise fashion. It uses a pre-defined loss function to measure
the error in each step and correct for it in the next.
Evaluate Model
• Process of checking how well the values are predicted
• RMS- measure of the differences between values predicted by a model and the values observed. The lower it is the
Better.
• Rsquared- The Closer it is to 1 it’s better. Provides a measure of how well observed outcomes are replicated by the
model, based on the proportion of total variation of outcomes explained by the model
18. Problem: Sentiment Analysis
• Problem
• Predict the sentiment of a new website comment, either positive or negative
• Statistical Inferencing
• Classification
• Binary or binomial classification is the task of classifying the elements of a
given set into two groups (predicting which group each one belongs to) on the
basis of a classification rule.
19. Solution: Sentiment Analysis
Load & Transform
Data
•Pipeline-Workflow to Train Your Data
•TextLoader-convert the SentimentText column into a numeric vector
Choosing Learning
Algorithm &
Training Model
• Binary Classification- FastTreeBinaryClassifier, Gradient boosting is a machine learning technique for regression problems,
In case of a binary classification problem, the output is converted to a probability by using some form of calibration.
Evaluate Model
• Process of checking how well the values are predicted
• Computes the quality metrics for the PredictionModel using the specified data set.
22. Making Data Right
• Get More Quality Data
• Generate More Data
• Data Cleaning
• Reframing Problem
• Transform Your Data :Gaussian
• Select Your Features Right
• Engineer Your Features
23. Implying the Algorithm Right
• Resampling: Use a method and configuration that makes the best use of available
data. The k-fold cross-validation method with a hold out validation dataset might
be a best practice.
• Evaluation Metric. What metric is used to evaluate the skill of predictions? Use a
metric that best captures the requirements of the problem and the domain
• Baseline Performance. What is the baseline performance for comparing
algorithms? Use a random algorithm or a zero rule algorithm (predict mean or
mode) to establish a baseline by which to rank all evaluated algorithms.
• Spot Check Linear Algorithms
• Spot Check Nonlinear Algorithms.
• Steal from Literature. What algorithms are reported in the literature
• Standard Configurations
• Try Alternatives