3. Proven & Extensible
Open Source & Cross platform
dot.net/ml
Build your own
Developer Focused
ML.NET is a machine learning framework
made for .NET developers
4. And many more examples
@ https://github.com/dotnet/machinelearning-samples
Customer segmentation
Recommendations
Predictive maintenance
Forecasting
Issue Classification
Ranking news/topics
Image classification
Sentiment Analysis
Machine Learning scenarios with ML.NET
7. Comment Toxic? (Sentiment)
==RUDE== Dude, you are rude … 1
== OK! == IM GOING TO VANDALIZE … 1
I also found use of the word "humanists” confusing … 0
Oooooh thank you Mr. DietLime … 0
Wikipedia detox data at https://figshare.com/articles/Wikipedia_Talk_Labels_Personal_Attacks/4054689
Features (input) Label (output)
Sentiment Analysis
8. Prepare Your Data
Example
Comment Toxic? (Sentiment)
==RUDE== Dude, you are rude … 1
== OK! == IM GOING TO VANDALIZE … 1
I also found use of the word "humanists” confusing … 0
Oooooh thank you Mr. DietLime … 0
Important concepts: Data
9. Prepare Your Data
Text Featurizer
Featurized Text
[0.76, 0.65, 0.44, …]
[0.98, 0.43, 0.54, …]
[0.35, 0.73, 0.46, …]
[0.39, 0, 0.75, …]
Example
Text
==RUDE== Dude, you are rude …
== OK! == IM GOING TO VANDALIZE …
I also found use of the word "humanists” …
Oooooh thank you Mr. DietLime …
Important concepts: Transformer
10. Build & Train
Example
Estimator
Comment Toxic? (Sentiment)
==RUDE== Dude, you … 1
== OK! == IM GOING … 1
I also found use of the … 0
Oooooh thank you Mr. … 0
Important concepts: Estimator
11. Comment
==RUDE== Dude, you …
Prediction Function
Predicted Label – Toxic? (Sentiment)
1
Run
Example
Important concepts: Prediction Function
14. Anomaly Detection
Anomaly detection detects data
points in data that does not fit well
with the rest of the data.
It has a wide range of applications
such as fraud detection, surveillance,
diagnosis, data cleanup, and
predictive maintenance.
20. How much is the taxi fare for 1 passenger going from Burlington to Toronto?
ML.NET CLI global tool accelerates productivity
AutoML with ML.NET
21. Criterion
Loss
Min Samples Split
Min Samples Leaf
XYZ
Parameter 1
Parameter 2
Parameter 3
Parameter 4
…
Distance
Trip time
Car type
Passengers
Time of day
…
Gradient Boosted
Nearest Neighbors
SGD
Bayesian Regression
LGBM
…
Distance Gradient Boosted
Model
Car type
Passengers
Getting started w/machine learning can be hard
ML.NET takes the guess work out of data prep,
feature selection & hyperparameter tuning
Which algorithm? Which parameters?
Which features?
Getting started w/machine learning can be
hard
22. N Neighbors
Weights
Metric
P
ZYX
Criterion
Loss
Min Samples Split
Min Samples Leaf
XYZ
Which algorithm? Which parameters?
Which features?
Distance
Trip time
Car type
Passengers
Time of day
…
Gradient Boosted
Nearest Neighbors
SGD
Bayesian Regression
LGBM
…
Nearest Neighbors
Model
Iterate
Gradient Boosted
Distance
Car brand
Year of make
Car type
Passengers
Trip time
Getting started w/machine learning can be hard
ML.NET takes the guess work out of data prep,
feature selection & hyperparameter tuning
Getting started w/machine learning can be
hard
23. Which algorithm? Which parameters?
Which features?
Iterate
Getting started w/machine learning can be hard
ML.NET takes the guess work out of data prep,
feature selection & hyperparameter tuning
Getting started w/machine learning can be
hard
25. 70%
95% Feature importance
Distance
Trip time
Car type
Passengers
Time of day
0 1
Model B (70%)
Distance
0 1
Trip time
Car type
Passengers
Time of day
Feature importance Model A (95%)
ML.NET accelerates model development
with model explainability
ML.NET accelerates model development
33. Getting started with ML.Net
Bruno Capuano
Innovation Lead @Avanade
@elbruno | http://elbruno.com
Editor's Notes
.NET is a great tech stack for building a wide variety of applications. There is ASP.NET for web development, Xamarin for mobile development and with ML.NET we are trying to make .NET great for Machine Learning.
3
4
5
7
8
9
10
11
The Anomaly Detection API can detect the following types of anomalies on time series data:
Spikes and Dips: For example, when monitoring the number of login failures to a service or number of checkouts in an e-commerce site, unusual spikes or dips could indicate security attacks or service disruptions.
Positive and negative trends: When monitoring memory usage in computing, for instance, shrinking free memory size is indicative of a potential memory leak; when monitoring service queue length, a persistent upward trend may indicate an underlying software issue.
Level changes and changes in dynamic range of values: For example, level changes in latencies of a service after a service upgrade or lower levels of exceptions after upgrade can be interesting to monitor.
The machine learning based API enables:
Flexible and robust detection: The anomaly detection models allow users to configure sensitivity settings and detect anomalies among seasonal and non-seasonal data sets. Users can adjust the anomaly detection model to make the detection API less or more sensitive according to their needs. This would mean detecting the less or more visible anomalies in data with and without seasonal patterns.
Scalable and timely detection: The traditional way of monitoring with preset thresholds set by experts' domain knowledge are costly and not scalable to millions of dynamically changing data sets. The anomaly detection models in this API are learned and models are tuned automatically from both historical and real-time data.
Proactive and actionable detection: Slow trend and level change detection can be applied for early anomaly detection. The early abnormal signals detected can be used to direct humans to investigate and act on the problem areas. In addition, root cause analysis models and alerting tools can be developed on top of this anomaly detection API service.
The anomaly detection API is an effective and efficient solution for a wide range of scenarios like service health & KPI monitoring, IoT, performance monitoring, and network traffic monitoring. Here are some popular scenarios where this API can be useful:
IT departments need tools to track events, error code, usage log, and performance (CPU, Memory and so on) in a timely manner.
Online commerce sites wants to track customer activities, page views, clicks, and so on.
Utility companies want to track consumption of water, gas, electricity and other resources.
Facility/Building management services want to monitor temperature, moisture, traffic and so on.
IoT/manufacturers want to use sensor data in time series to monitor work flow, quality and so on.
Service providers, such as call centers need to monitor service demand trend, incident volume, wait queue length and so on.
Business analytics groups want to monitor business KPIs' (such as sales volume, customer sentiments, pricing) abnormal movement in real time.
17
ML.NET provides tooling that makes it easy to use. In particular, 2 really valuable tools are: AutoML and Model Builder
What is AutoML? It is an API that accelerates model development for you. A lot of developers do not have the experience required to build or train Machine Learning models. With AutoML, the process of finding the best algorithm, is automated!
Model Builder on the other hand provides an easy to understand visual interface to build, train, and deploy custom machine learning models. Prior machine learning expertise is not required. It also supports AutoML
Rememeber depending on your data, giving you the error of each of the models and you can then decide which model to use. Most people just use the model with the least error.
And we will see it in action soon.
To demonstrate what AutoML is, let’s consider that we want to provide a service that allows users to predict taxi fare before they book or call a taxi. How can we build this feature/service?
A data scientist’s job is to find the best algorithm that will do taxi fare prediction.
Let’s says we have a dataset that contains information such as trip distance, trip time, number of passengers, time of day of the trip etc.
A data scientist will spend a lot of time trying to decide which of these pieces of information is important when predicting taxi fare.
In ML, there are so many algorithms and are generally referred to as trainers, for example linear regression, convolutional neural network etc
The data scientist will try one algorithm at a time, picking features as he desires, and then wait to see how the model performs.
In this case, this model only scored 30% based on number of bad predictions it made.