SlideShare una empresa de Scribd logo
1 de 17
Data Science
Essentials in Azure
ML
MOSTAFA ELZOGHBI
SR. TECHNICAL EVANGELIST – MICROSOFT
@MOSTAFAELZOGHBI
Session Objectives & Takeaways
Data Science basic concepts of regression, classification,
clustering and recommendations.
Azure ML studio platform capabilities
Data Science Involves
 Data science is about using data to make decision that drive actions.
 Data science process involves:
 Data selection
 Preprocessing
 Transformation
 Data Mining
 Delivering value from data : Interpretation and evaluation
Machine Learning Overview
 Machine Learning is a computing technique that has its origins in artificial intelligence (AI)
and statistics. Machine Learning solutions include:
 Classification - Predicting a Boolean true/false value for an entity with a given set of features.
 Regression - Predicting a real numeric value for an entity with a given set of features.
 Clustering - Grouping entities with similar features.
 Recommendation - Recommending an item to a user based on past behavior or preferences of
similar users (Recommender systems).
Classification
 We want to teach a computer how to recognize images of chairs?
 We will provide set of images to a computer and tell which is a chair and which
is not.
 A computer is supposed to learn to recognize chairs and which ones are chairs
and which are not chairs even from images that has not seen before.
 It is learning by example
 To build this experiment, we need training set and test set. We need as much
data as we can.
 Each observation is represented by a set of numbers (features)
 A training & test sets are set of vector numbers and the label is represented by
a number
 Example: if an image is a chair, label will be +1 otherwise -1
Classification Example
Classification Cont.
 Yes/No questions is the most basic
 Examples: Automatic handwriting recognition, speech recognition, biometrics,
document classification, spam detection, credit card fraud, predicting customer
churn….etc.
 Formally, given training set (xi,yi) for i=1…n, we want to create a classification model f
that can predict label y for a new x
 Machine learning algorithm creates function f(x) for you.
 The predicted value of y for a new x is simply the sigh of the function f(x)
Regression
 For predicting real-valued outcomes:
 How many customers will arrive at our website next week?
 How many TV’s will sell next year?
 Can we predict someone’s income from their click through information?
 Formally, given training set (Xi,Yi) for i=1…n, we want to create a regression model f
that can predict label y for a new x
Supervised Learning
 Classification and Regression are supervised learning problems
 “Supervised” means that the training data has ground truth labels to learn from
 (Supervised) classification often has +1 or -1 labels
 (Supervised) regression has numerical labels
 There are lots of supervised problems
 Supervised learning algorithm are much easier to evaluate than unsupervised ones
DEMO
Regression: Demand Estimation
(Bike Rental)
Clustering
 Clustering is an unsupervised learning problem
 “Unsupervised” means that the training set has no ground truth labels to learn from
 This means they are much harder to evaluate
 Clustering groups data entities based on their feature values.
Recommendation
 Recommender systems are machine learning solutions that match individuals to items
based on the preferences of other similar individuals, or other similar items that the
individual is already known to like.
 Recommendation is one of the most commonly used forms of machine learning.
 An example of recommender system: Netflix Contest
 Build a better recommender system from Netflix data
 Azure ML modules: split, train matchbox recommender,
Score matchbox recommender, Evaluate Recommender.
Recommendation Cont.
 Use Score Matchbox Recommender to generate predictions
 You can generate the following kinds of prediction:
 Item Recommendation: Predicts recommended items based on a given user.
 Related Items: Predicts recommended items based on a given item.
 Rating Prediction: Predicts ratings for given users and items.
 Related Users: Predicts users based on a given user.
 Use the Evaluate Recommender module to evaluate recommender performance.
DEMO
Movie Recommender System
Azure ML Platform Features
 Notebooks for writing code in Azure ML (Jupyter)
 Building Programmable component using Python & R
 Publish experiments as web services
 Save trained models for re-usability
 Projects for creating combined assets (Experiments, datasets, notebooks..etc).
References
 Free e-book “Azure Machine Learning”
 https://mva.microsoft.com/ebooks#9780735698178
 Azure Machine Learning documentation
 https://azure.microsoft.com/en-us/documentation/services/machine-
learning/
 Data Science and Machine Learning Essentials
 www.edx.org
Thank you
 Check out my blog for Azure ML articles:
www.MostafaElzoghbi.com
 Follow me on Twitter: @MostafaElzoghbi

Más contenido relacionado

La actualidad más candente

activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
butest
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
butest
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
butest
 
Introduction
IntroductionIntroduction
Introduction
butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 

La actualidad más candente (19)

ML Basics
ML BasicsML Basics
ML Basics
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and Applications
 
Learning
LearningLearning
Learning
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction
IntroductionIntroduction
Introduction
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Tweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVMTweets Classification using Naive Bayes and SVM
Tweets Classification using Naive Bayes and SVM
 
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
(Machine)Learning with limited labels(Machine)Learning with limited labels(Ma...
 
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre..."An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 

Destacado

Destacado (17)

Build intelligent solutions using Azure
Build intelligent solutions using AzureBuild intelligent solutions using Azure
Build intelligent solutions using Azure
 
Extending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook ConnectorsExtending Product Outreach with Outlook Connectors
Extending Product Outreach with Outlook Connectors
 
Introducing Power BI Embedded
Introducing Power BI EmbeddedIntroducing Power BI Embedded
Introducing Power BI Embedded
 
Patterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-insPatterns and Practices in Building Office Add-ins
Patterns and Practices in Building Office Add-ins
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Build Interactive Analytics using Power BI
Build Interactive Analytics using Power BIBuild Interactive Analytics using Power BI
Build Interactive Analytics using Power BI
 
Build intelligent solutions using ms azure
Build intelligent solutions using ms azureBuild intelligent solutions using ms azure
Build intelligent solutions using ms azure
 
PnP in building office add ins - public
PnP in building office add ins - publicPnP in building office add ins - public
PnP in building office add ins - public
 
Mistakes that kill startups
Mistakes that kill startupsMistakes that kill startups
Mistakes that kill startups
 
TypeScript Jump Start
TypeScript Jump StartTypeScript Jump Start
TypeScript Jump Start
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 
Architecting big data solutions in the cloud
Architecting big data solutions in the cloudArchitecting big data solutions in the cloud
Architecting big data solutions in the cloud
 

Similar a Data science essentials in azure ml

Internship - Python - AI ML.pptx
Internship - Python - AI ML.pptxInternship - Python - AI ML.pptx
Internship - Python - AI ML.pptx
Hchethankumar
 
Internship - Python - AI ML.pptx
Internship - Python - AI ML.pptxInternship - Python - AI ML.pptx
Internship - Python - AI ML.pptx
Hchethankumar
 

Similar a Data science essentials in azure ml (20)

Machine Learning Basics
Machine Learning BasicsMachine Learning Basics
Machine Learning Basics
 
Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptx
 
recent.pptx
recent.pptxrecent.pptx
recent.pptx
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
Machine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-codeMachine learning-in-details-with-out-python-code
Machine learning-in-details-with-out-python-code
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdf
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
Machine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domainsMachine learning applications nurturing growth of various business domains
Machine learning applications nurturing growth of various business domains
 
Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence Movie Recommender System Using Artificial Intelligence
Movie Recommender System Using Artificial Intelligence
 
Data Science.pptx
Data Science.pptxData Science.pptx
Data Science.pptx
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Lecture-6-7.pptx
Lecture-6-7.pptxLecture-6-7.pptx
Lecture-6-7.pptx
 
Brochure data science learning path board-infinity (1)
Brochure   data science learning path board-infinity (1)Brochure   data science learning path board-infinity (1)
Brochure data science learning path board-infinity (1)
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Internship - Python - AI ML.pptx
Internship - Python - AI ML.pptxInternship - Python - AI ML.pptx
Internship - Python - AI ML.pptx
 
Internship - Python - AI ML.pptx
Internship - Python - AI ML.pptxInternship - Python - AI ML.pptx
Internship - Python - AI ML.pptx
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Unit-2-part-2Machine Learning.pptx
Unit-2-part-2Machine Learning.pptxUnit-2-part-2Machine Learning.pptx
Unit-2-part-2Machine Learning.pptx
 

Más de Mostafa

Más de Mostafa (11)

The role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud publicThe role of intelligent sensors in the cloud public
The role of intelligent sensors in the cloud public
 
Skill up in machine learning using Azure ML
Skill up in machine learning using Azure MLSkill up in machine learning using Azure ML
Skill up in machine learning using Azure ML
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
How to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud serviceHow to migrate Console Apps as a cloud service
How to migrate Console Apps as a cloud service
 
HBase introduction in azure
HBase introduction in azureHBase introduction in azure
HBase introduction in azure
 
eRecall
eRecalleRecall
eRecall
 
Get your site microsoft edge ready
Get your site microsoft edge readyGet your site microsoft edge ready
Get your site microsoft edge ready
 
Developing cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache CordovaDeveloping cross platform mobile apps using Apache Cordova
Developing cross platform mobile apps using Apache Cordova
 
Identity and o365 on Azure
Identity and o365 on AzureIdentity and o365 on Azure
Identity and o365 on Azure
 
Azure Data platform
Azure Data platformAzure Data platform
Azure Data platform
 
Building IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & AzureBuilding IoT solutions using Windows 10 IoT Core & Azure
Building IoT solutions using Windows 10 IoT Core & Azure
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Data science essentials in azure ml

  • 1. Data Science Essentials in Azure ML MOSTAFA ELZOGHBI SR. TECHNICAL EVANGELIST – MICROSOFT @MOSTAFAELZOGHBI
  • 2. Session Objectives & Takeaways Data Science basic concepts of regression, classification, clustering and recommendations. Azure ML studio platform capabilities
  • 3. Data Science Involves  Data science is about using data to make decision that drive actions.  Data science process involves:  Data selection  Preprocessing  Transformation  Data Mining  Delivering value from data : Interpretation and evaluation
  • 4. Machine Learning Overview  Machine Learning is a computing technique that has its origins in artificial intelligence (AI) and statistics. Machine Learning solutions include:  Classification - Predicting a Boolean true/false value for an entity with a given set of features.  Regression - Predicting a real numeric value for an entity with a given set of features.  Clustering - Grouping entities with similar features.  Recommendation - Recommending an item to a user based on past behavior or preferences of similar users (Recommender systems).
  • 5. Classification  We want to teach a computer how to recognize images of chairs?  We will provide set of images to a computer and tell which is a chair and which is not.  A computer is supposed to learn to recognize chairs and which ones are chairs and which are not chairs even from images that has not seen before.  It is learning by example  To build this experiment, we need training set and test set. We need as much data as we can.  Each observation is represented by a set of numbers (features)  A training & test sets are set of vector numbers and the label is represented by a number  Example: if an image is a chair, label will be +1 otherwise -1
  • 7. Classification Cont.  Yes/No questions is the most basic  Examples: Automatic handwriting recognition, speech recognition, biometrics, document classification, spam detection, credit card fraud, predicting customer churn….etc.  Formally, given training set (xi,yi) for i=1…n, we want to create a classification model f that can predict label y for a new x  Machine learning algorithm creates function f(x) for you.  The predicted value of y for a new x is simply the sigh of the function f(x)
  • 8. Regression  For predicting real-valued outcomes:  How many customers will arrive at our website next week?  How many TV’s will sell next year?  Can we predict someone’s income from their click through information?  Formally, given training set (Xi,Yi) for i=1…n, we want to create a regression model f that can predict label y for a new x
  • 9. Supervised Learning  Classification and Regression are supervised learning problems  “Supervised” means that the training data has ground truth labels to learn from  (Supervised) classification often has +1 or -1 labels  (Supervised) regression has numerical labels  There are lots of supervised problems  Supervised learning algorithm are much easier to evaluate than unsupervised ones
  • 11. Clustering  Clustering is an unsupervised learning problem  “Unsupervised” means that the training set has no ground truth labels to learn from  This means they are much harder to evaluate  Clustering groups data entities based on their feature values.
  • 12. Recommendation  Recommender systems are machine learning solutions that match individuals to items based on the preferences of other similar individuals, or other similar items that the individual is already known to like.  Recommendation is one of the most commonly used forms of machine learning.  An example of recommender system: Netflix Contest  Build a better recommender system from Netflix data  Azure ML modules: split, train matchbox recommender, Score matchbox recommender, Evaluate Recommender.
  • 13. Recommendation Cont.  Use Score Matchbox Recommender to generate predictions  You can generate the following kinds of prediction:  Item Recommendation: Predicts recommended items based on a given user.  Related Items: Predicts recommended items based on a given item.  Rating Prediction: Predicts ratings for given users and items.  Related Users: Predicts users based on a given user.  Use the Evaluate Recommender module to evaluate recommender performance.
  • 15. Azure ML Platform Features  Notebooks for writing code in Azure ML (Jupyter)  Building Programmable component using Python & R  Publish experiments as web services  Save trained models for re-usability  Projects for creating combined assets (Experiments, datasets, notebooks..etc).
  • 16. References  Free e-book “Azure Machine Learning”  https://mva.microsoft.com/ebooks#9780735698178  Azure Machine Learning documentation  https://azure.microsoft.com/en-us/documentation/services/machine- learning/  Data Science and Machine Learning Essentials  www.edx.org
  • 17. Thank you  Check out my blog for Azure ML articles: www.MostafaElzoghbi.com  Follow me on Twitter: @MostafaElzoghbi

Notas del editor

  1. Data Science essentials in Azure ML In this session i am covering data science principals such as: Regression, Clustering, Classification, Recommendation and how to build programmable components in Azure Machine Learning experiments using data science programming languages. The session shows and illustrate how to implement these concepts using Azure ML studio.
  2. A) Main concepts to cover for Data Science: Regression Classification Clustering Recommendation B) Building programmable components in Azure ML experiments C) Working with Azure ML studio
  3. Ref: EDX.org
  4. EDX – ch5 - classification For a classification function to work accurately, when operating on the data in the training and test datasets, the number of times that the sign of f(x) does not equal y must be minimized. In other words, for a single entity, if y is positive, f(x) should be positive; and if y is negative, f(x) should be negative. Formally, we need to minimize cases where y ≠ sign(f(x)). Because same-signed numbers when multiplied together always produce a positive, and numbers of different signs multiplied together always produce a negative, we can simplify our goal to minimize cases where yf(x) < 0; or for the whole data set ∑iyif(xi) < 0. This general approach is known as a loss function. As with regression algorithms, some classification algorithms add a regularization term to avoid over-fitting so that the function achieves a balance of accuracy and simplicity. Each classification algorithm (for example AdaBoost, Support Vector Machines, and Logistic Regression) uses a specific loss function implementation, and it's this that distinguishes classification algorithms from one another. Decision Trees are classification algorithms that define a sequence of branches. At each branch intersection, the feature value (x) is compared to a specific function, and the result determines which branch the algorithm follows. All branches eventually lead to a predicted value (-1 or +1). Most decision trees algorithms have been around for a while, and many produce low accuracy. However, boosted decision trees (AdaBoost applied to a decision tree) can be very effective. You can use a "one vs. all" technique to extend binary classification (which predicts a Boolean value) so that it can be used in multi-class classification. This approach involves applying multiple binary classifications (for example, "is this a chair?", "is this a bird?", and so on) and reviewing the results produced by f(x) for each test. Since f(x) produces a numeric result, the predicted value is a measure of confidence in the prediction (so for example, a high positive result for "is this a chair?" combined with a low positive result for "is this a bird?" and a high negative result for "is this an elephant?" indicates a high degree of confidence that the object is more likely to be a chair than a bird, and very unlikely to be an elephant.) When the training data is imbalanced (so a high proportion of the data has the same True/False value for y) , the accuracy of the classification algorithm can be compromised. To overcome this problem, you can "over-sample" or "weight" data with the minority y value to balance the algorithm. The quality of a classification model can be assessed by plotting the True Positive Rate (the number of positives that were classified by the ML algorithm as positives divided by total number of positives) against the False Positive Rate (the number of negatives that were classified by the ML algorithm as positives divided by the total number of negatives) for various parameter values on a chart to create a receiver operator characteristic (ROC) curve. The quality of the model is reflected in the area under the curve. The larger this area is than the area under a straight diagonal line (representing a 50% accuracy rate that can be achieved purely by guessing), the better the model;
  5. Overfitting vs underfitting problem with regression algorithms EDX – ch4 – Regression: When operating on the data in the training and test datasets, the difference between the known y value and the value calculated by f(x) must be minimized. In other words, for a single entity, y - f(x) should be close to zero. This is controlled by the values of any baseline variables (b) used in the function. For various reasons, computers work better with smooth functions than with absolute values, so the values are squared. In other words, (y - f(x))2 should be close to zero. To apply this rule to all rows in the training or test data, the squared values are summed over the whole dataset, so ∑(yi - f(xi))2 should be close to zero. This quantity is known as the sum of squares errors (or SSE). The regression algorithm minimizes this by adjusting the values of baseline variables (b) used in the function. Minimizing the SSE for simple linear regression models (where there is only a single x value) generally works well, but when there are multiple x values, it can lead to over-fitting. To avoid this, the you can use ridge regression, in which the regression algorithm adds a regularization term to the SSE. This helps achieve a balance of accuracy and simplicity in the function. Support vector machine regression is an alternative to ridge regression that uses a similar technique to minimize the difference between the predicted f(x) values and the known y values. To determine the best algorithm for a specific set of data, you can use cross-validation, in which the data is divided into folds, with each fold in turn being used to test algorithms that are trained on the other folds. To determine the optimal value for a parameter (k) for an algorithm, you can use nested cross-validation, in which one of the folds in the training set is used to validate all possible k values. This process is repeated so that each fold in the training set is used as a validation fold. The best result is then tested with the test fold, and the whole process is repeated again until every fold has been used as the test fold.
  6. Overfitting vs underfitting problem with regression algorithms
  7. Regression: Demand estimation https://gallery.cortanaintelligence.com/Experiment/Regression-Demand-estimation-4?r=legacy Score Model scores a trained classification or regression model against a test dataset. That is, the module generates predictions using the trained model. Evaluate Model takes the scored dataset and uses it to generate some evaluation metrics. You can then visualize the evaluation metrics. Set A = weather + holiday + weekday + weekend features for the predicted day Set B = number of bikes that were rented in each of the previous 12 hours Set C = number of bikes that were rented in each of the previous 12 days at the same hour Set D = number of bikes that were rented in each of the previous 12 weeks at the same hour and the same day
  8. Clustering groups data entities based on their feature values. Each data entity is described as a vector of one or more numeric features (x), which can be used to plot the entity as a specific point. K-Means clustering is a technique in which the algorithm groups the data entities into a specified number of clusters (k). To begin, the algorithm selects k random centroid points, and assigns each data entity to a cluster based on the shortest distance to a centroid. Each centroid is then moved to the central point (i.e. themean) of its cluster, and the distances between the data entities and the new centroids are evaluated, with entities reassigned to another cluster if it is closer. This process continues until every data entity is closer to the mean centroid of its cluster than to the centroid of any other cluster. The following image shows the result of K=Means clustering with a k value of 3:
  9. The recommender systems and matrix factorization You can create recommenders using regression, classification, or clustering models; but a common approach is to create a filter-based recommender that uses matrix factorization. Azure ML supports the Matchbox Recommender model, which uses this technique. In Azure ML, you should use the Recommender Split mode of the Split module to prepare data for a recommender, as this distributes user-item pairs evenly between the training and test data sets. After splitting the data, you can use a Train Matchbox Recommender module to train a recommender. After training the recommender, you can use the Score Matchbox Recommender to generate predictions. You can generate the following kinds of prediction: Item Recommendation: Predicts recommended items based on a given user. Related Items: Predicts recommended items based on a given item. Rating Prediction: Predicts ratings for given users and items. Related Users: Predicts users based on a given user. Use the Evaluate Recommender module to evaluate recommender performance. You can evaluate the model based on the following metrics: Normalized Discounted Cumulative Gain (NDCG) for item recommendation, related items, and related users. Mean Absolute Error (MAE) and Root Mean Squared Error (RSME) for rating prediction
  10. The recommender systems and matrix factorization You can create recommenders using regression, classification, or clustering models; but a common approach is to create a filter-based recommender that uses matrix factorization. Azure ML supports the Matchbox Recommender model, which uses this technique. In Azure ML, you should use the Recommender Split mode of the Split module to prepare data for a recommender, as this distributes user-item pairs evenly between the training and test data sets. After splitting the data, you can use a Train Matchbox Recommender module to train a recommender. After training the recommender, you can use the Score Matchbox Recommender to generate predictions. You can generate the following kinds of prediction: Item Recommendation: Predicts recommended items based on a given user. Related Items: Predicts recommended items based on a given item. Rating Prediction: Predicts ratings for given users and items. Related Users: Predicts users based on a given user. Use the Evaluate Recommender module to evaluate recommender performance. You can evaluate the model based on the following metrics: Normalized Discounted Cumulative Gain (NDCG) for item recommendation, related items, and related users. Mean Absolute Error (MAE) and Root Mean Squared Error (RSME) for rating prediction
  11. https://gallery.cortanaintelligence.com/Experiment/Movie-Recommendation-1
  12. The recommender systems and matrix factorization