In this session i am covering data science principals such as: Regression, Clustering, Classification, Recommendation and how to build programmable components in Azure Machine Learning experiments using data science programming languages. The session shows and illustrate how to implement these concepts using Azure ML studio.
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Data science essentials in azure ml
1. Data Science
Essentials in Azure
ML
MOSTAFA ELZOGHBI
SR. TECHNICAL EVANGELIST – MICROSOFT
@MOSTAFAELZOGHBI
2. Session Objectives & Takeaways
Data Science basic concepts of regression, classification,
clustering and recommendations.
Azure ML studio platform capabilities
3. Data Science Involves
Data science is about using data to make decision that drive actions.
Data science process involves:
Data selection
Preprocessing
Transformation
Data Mining
Delivering value from data : Interpretation and evaluation
4. Machine Learning Overview
Machine Learning is a computing technique that has its origins in artificial intelligence (AI)
and statistics. Machine Learning solutions include:
Classification - Predicting a Boolean true/false value for an entity with a given set of features.
Regression - Predicting a real numeric value for an entity with a given set of features.
Clustering - Grouping entities with similar features.
Recommendation - Recommending an item to a user based on past behavior or preferences of
similar users (Recommender systems).
5. Classification
We want to teach a computer how to recognize images of chairs?
We will provide set of images to a computer and tell which is a chair and which
is not.
A computer is supposed to learn to recognize chairs and which ones are chairs
and which are not chairs even from images that has not seen before.
It is learning by example
To build this experiment, we need training set and test set. We need as much
data as we can.
Each observation is represented by a set of numbers (features)
A training & test sets are set of vector numbers and the label is represented by
a number
Example: if an image is a chair, label will be +1 otherwise -1
7. Classification Cont.
Yes/No questions is the most basic
Examples: Automatic handwriting recognition, speech recognition, biometrics,
document classification, spam detection, credit card fraud, predicting customer
churn….etc.
Formally, given training set (xi,yi) for i=1…n, we want to create a classification model f
that can predict label y for a new x
Machine learning algorithm creates function f(x) for you.
The predicted value of y for a new x is simply the sigh of the function f(x)
8. Regression
For predicting real-valued outcomes:
How many customers will arrive at our website next week?
How many TV’s will sell next year?
Can we predict someone’s income from their click through information?
Formally, given training set (Xi,Yi) for i=1…n, we want to create a regression model f
that can predict label y for a new x
9. Supervised Learning
Classification and Regression are supervised learning problems
“Supervised” means that the training data has ground truth labels to learn from
(Supervised) classification often has +1 or -1 labels
(Supervised) regression has numerical labels
There are lots of supervised problems
Supervised learning algorithm are much easier to evaluate than unsupervised ones
11. Clustering
Clustering is an unsupervised learning problem
“Unsupervised” means that the training set has no ground truth labels to learn from
This means they are much harder to evaluate
Clustering groups data entities based on their feature values.
12. Recommendation
Recommender systems are machine learning solutions that match individuals to items
based on the preferences of other similar individuals, or other similar items that the
individual is already known to like.
Recommendation is one of the most commonly used forms of machine learning.
An example of recommender system: Netflix Contest
Build a better recommender system from Netflix data
Azure ML modules: split, train matchbox recommender,
Score matchbox recommender, Evaluate Recommender.
13. Recommendation Cont.
Use Score Matchbox Recommender to generate predictions
You can generate the following kinds of prediction:
Item Recommendation: Predicts recommended items based on a given user.
Related Items: Predicts recommended items based on a given item.
Rating Prediction: Predicts ratings for given users and items.
Related Users: Predicts users based on a given user.
Use the Evaluate Recommender module to evaluate recommender performance.
15. Azure ML Platform Features
Notebooks for writing code in Azure ML (Jupyter)
Building Programmable component using Python & R
Publish experiments as web services
Save trained models for re-usability
Projects for creating combined assets (Experiments, datasets, notebooks..etc).
17. Thank you
Check out my blog for Azure ML articles:
www.MostafaElzoghbi.com
Follow me on Twitter: @MostafaElzoghbi
Notas del editor
Data Science essentials in Azure ML
In this session i am covering data science principals such as: Regression, Clustering, Classification, Recommendation and how to build programmable components in Azure Machine Learning experiments using data science programming languages. The session shows and illustrate how to implement these concepts using Azure ML studio.
A) Main concepts to cover for Data Science:
Regression
Classification
Clustering
Recommendation
B) Building programmable components in Azure ML experiments
C) Working with Azure ML studio
Ref: EDX.org
EDX – ch5 - classification
For a classification function to work accurately, when operating on the data in the training and test datasets, the number of times that the sign of f(x) does not equal y must be minimized. In other words, for a single entity, if y is positive, f(x) should be positive; and if y is negative, f(x) should be negative. Formally, we need to minimize cases where y ≠ sign(f(x)).
Because same-signed numbers when multiplied together always produce a positive, and numbers of different signs multiplied together always produce a negative, we can simplify our goal to minimize cases where yf(x) < 0; or for the whole data set ∑iyif(xi) < 0. This general approach is known as a loss function.
As with regression algorithms, some classification algorithms add a regularization term to avoid over-fitting so that the function achieves a balance of accuracy and simplicity.
Each classification algorithm (for example AdaBoost, Support Vector Machines, and Logistic Regression) uses a specific loss function implementation, and it's this that distinguishes classification algorithms from one another.
Decision Trees are classification algorithms that define a sequence of branches. At each branch intersection, the feature value (x) is compared to a specific function, and the result determines which branch the algorithm follows. All branches eventually lead to a predicted value (-1 or +1). Most decision trees algorithms have been around for a while, and many produce low accuracy. However, boosted decision trees (AdaBoost applied to a decision tree) can be very effective.
You can use a "one vs. all" technique to extend binary classification (which predicts a Boolean value) so that it can be used in multi-class classification. This approach involves applying multiple binary classifications (for example, "is this a chair?", "is this a bird?", and so on) and reviewing the results produced by f(x) for each test. Since f(x) produces a numeric result, the predicted value is a measure of confidence in the prediction (so for example, a high positive result for "is this a chair?" combined with a low positive result for "is this a bird?" and a high negative result for "is this an elephant?" indicates a high degree of confidence that the object is more likely to be a chair than a bird, and very unlikely to be an elephant.)
When the training data is imbalanced (so a high proportion of the data has the same True/False value for y) , the accuracy of the classification algorithm can be compromised. To overcome this problem, you can "over-sample" or "weight" data with the minority y value to balance the algorithm.
The quality of a classification model can be assessed by plotting the True Positive Rate (the number of positives that were classified by the ML algorithm as positives divided by total number of positives) against the False Positive Rate (the number of negatives that were classified by the ML algorithm as positives divided by the total number of negatives) for various parameter values on a chart to create a receiver operator characteristic (ROC) curve. The quality of the model is reflected in the area under the curve. The larger this area is than the area under a straight diagonal line (representing a 50% accuracy rate that can be achieved purely by guessing), the better the model;
Overfitting vs underfitting problem with regression algorithms
EDX – ch4 – Regression:
When operating on the data in the training and test datasets, the difference between the known y value and the value calculated by f(x) must be minimized. In other words, for a single entity, y - f(x) should be close to zero. This is controlled by the values of any baseline variables (b) used in the function.
For various reasons, computers work better with smooth functions than with absolute values, so the values are squared. In other words, (y - f(x))2 should be close to zero.
To apply this rule to all rows in the training or test data, the squared values are summed over the whole dataset, so ∑(yi - f(xi))2 should be close to zero. This quantity is known as the sum of squares errors (or SSE). The regression algorithm minimizes this by adjusting the values of baseline variables (b) used in the function.
Minimizing the SSE for simple linear regression models (where there is only a single x value) generally works well, but when there are multiple x values, it can lead to over-fitting. To avoid this, the you can use ridge regression, in which the regression algorithm adds a regularization term to the SSE. This helps achieve a balance of accuracy and simplicity in the function.
Support vector machine regression is an alternative to ridge regression that uses a similar technique to minimize the difference between the predicted f(x) values and the known y values.
To determine the best algorithm for a specific set of data, you can use cross-validation, in which the data is divided into folds, with each fold in turn being used to test algorithms that are trained on the other folds.
To determine the optimal value for a parameter (k) for an algorithm, you can use nested cross-validation, in which one of the folds in the training set is used to validate all possible k values. This process is repeated so that each fold in the training set is used as a validation fold. The best result is then tested with the test fold, and the whole process is repeated again until every fold has been used as the test fold.
Overfitting vs underfitting problem with regression algorithms
Regression: Demand estimation
https://gallery.cortanaintelligence.com/Experiment/Regression-Demand-estimation-4?r=legacy
Score Model scores a trained classification or regression model against a test dataset. That is, the module generates predictions using the trained model.
Evaluate Model takes the scored dataset and uses it to generate some evaluation metrics. You can then visualize the evaluation metrics.
Set A = weather + holiday + weekday + weekend features for the predicted day
Set B = number of bikes that were rented in each of the previous 12 hours
Set C = number of bikes that were rented in each of the previous 12 days at the same hour
Set D = number of bikes that were rented in each of the previous 12 weeks at the same hour and the same day
Clustering groups data entities based on their feature values. Each data entity is described as a vector of one or more numeric features (x), which can be used to plot the entity as a specific point.
K-Means clustering is a technique in which the algorithm groups the data entities into a specified number of clusters (k). To begin, the algorithm selects k random centroid points, and assigns each data entity to a cluster based on the shortest distance to a centroid. Each centroid is then moved to the central point (i.e. themean) of its cluster, and the distances between the data entities and the new centroids are evaluated, with entities reassigned to another cluster if it is closer. This process continues until every data entity is closer to the mean centroid of its cluster than to the centroid of any other cluster. The following image shows the result of K=Means clustering with a k value of 3:
The recommender systems and matrix factorization
You can create recommenders using regression, classification, or clustering models; but a common approach is to create a filter-based recommender that uses matrix factorization. Azure ML supports the Matchbox Recommender model, which uses this technique.
In Azure ML, you should use the Recommender Split mode of the Split module to prepare data for a recommender, as this distributes user-item pairs evenly between the training and test data sets.
After splitting the data, you can use a Train Matchbox Recommender module to train a recommender.
After training the recommender, you can use the Score Matchbox Recommender to generate predictions. You can generate the following kinds of prediction:
Item Recommendation: Predicts recommended items based on a given user.
Related Items: Predicts recommended items based on a given item.
Rating Prediction: Predicts ratings for given users and items.
Related Users: Predicts users based on a given user.
Use the Evaluate Recommender module to evaluate recommender performance. You can evaluate the model based on the following metrics:
Normalized Discounted Cumulative Gain (NDCG) for item recommendation, related items, and related users.
Mean Absolute Error (MAE) and Root Mean Squared Error (RSME) for rating prediction
The recommender systems and matrix factorization
You can create recommenders using regression, classification, or clustering models; but a common approach is to create a filter-based recommender that uses matrix factorization. Azure ML supports the Matchbox Recommender model, which uses this technique.
In Azure ML, you should use the Recommender Split mode of the Split module to prepare data for a recommender, as this distributes user-item pairs evenly between the training and test data sets.
After splitting the data, you can use a Train Matchbox Recommender module to train a recommender.
After training the recommender, you can use the Score Matchbox Recommender to generate predictions. You can generate the following kinds of prediction:
Item Recommendation: Predicts recommended items based on a given user.
Related Items: Predicts recommended items based on a given item.
Rating Prediction: Predicts ratings for given users and items.
Related Users: Predicts users based on a given user.
Use the Evaluate Recommender module to evaluate recommender performance. You can evaluate the model based on the following metrics:
Normalized Discounted Cumulative Gain (NDCG) for item recommendation, related items, and related users.
Mean Absolute Error (MAE) and Root Mean Squared Error (RSME) for rating prediction