SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
 
 
Lesson One 
Introduction to Machine Learning  
- High Level Overview  
By: Oluwasgun Matthew & Abdulrazzaq Olajide  
Summary 
1. Introduction to Concept of Data Analytics and Machine Learning 
a. Data Mining and Statistical Pattern Recognition 
b. Supervised and Unsupervised Classification/Learning 
2. Types of Data - Continuous and Discrete Data 
3. Insight on Data Overfitting and Underfitting 
a. Introducing Outliers 
4. Scikit Learn usage in ML 
a. Support Vector Machine 
b. Gaussian Naive Bayes 
c. Decision Trees 
 
Let’s Dive In.. 
 
 
 
1 
 
 
Introduction - Concept of Data Analytics and Machine Learning 
In a world of data explosion, rate of data generation and consumption is on the increasing side, 
there comes the buzzword - Big Data. 
Big Data is the concept of fast moving, large volume data in varying dimensions (sources) and 
highly unpredicted sources. 
The 4Vs of Big Data 
● Volume - Scale of Data 
● Velocity - Analysis of Streaming Data 
● Variety - Different forms of Data 
● Veracity - Uncertainty of Data 
With increasing data availability, the new trend in the industry demands not just data collection, 
but making ample sense of acquired data - thereby, the concept of Data Analytics.  
Taking it a step further to further make futuristic prediction and realistic inferences - the concept 
of Machine Learning.  
A blend of both gives a robust analysis of data for the past, now and the future. 
There is a thin line between data analytics and Machine learning which becomes very obvious 
when you dig deep. 
Data Mining 
Data collection can be achieved either from static offline data generated from existing platforms 
or real-life data source in from of a stream. 
Pattern recognition in data is key to machine learning, finding relationship between features, 
labels and/or attributes of data set. 
For example, classification of animals into mammals and reptiles is solely dependent on physical 
attributes of animal set in consideration. 
Supervised and Unsupervised Learning 
Supervised learning ​is concerned with model or function generation from labeled data set. 
Making future inference based on existing predefined information about data attributes. 
2 
 
 
It’s a learning model where you have input variables (X) and an output variable (Y) and you use an                                     
algorithm to learn the mapping function from the input to the output. The goal is to approximate                                 
the mapping function so well that when you have new input data (X) that you can predict the                                   
output variables (Y) for that data. 
Y = f(X) 
It’s is called supervised learning because the process of an algorithm learning from the training                             
dataset can be thought of as a teacher supervising the learning process. We know the correct                               
answers, the algorithm iteratively makes predictions on the training data and is corrected by the                             
teacher. The Learning stops when the algorithm achieves an acceptable level of performance. 
A lot of machine learning project is centered around this as it’s easier than unsupervised, In this                                 
regard, there exist solutions like: 
● Recommender Systems 
● Prediction Engines 
● Image Recognition from Tagged Attributes 
● Time series prediction 
Supervised learning problems can be further grouped into regression and classification problems 
● Classification: a classification problem is when the output variable is a category, such as                           
“red” and “blue” or “disease” and “no disease” or “purchase” and “no purchase” 
● Regression: a regression problem is when the output variable is real value, such as                           
“weight”, “spend power”, “time of best billing” 
Some popular examples of supervised machine learning algorithms are: 
● Linear regression for regression problems 
● Random forest for classification and regression problems 
● Support vector machines for classification problems 
Unsupervised learning tries to deduce inference from unlabeled data, i.e. no prior knowledge of                           
attributes definition/classification.  
Unsupervised learning is where you only have input data (X) and no corresponding output                           
variables. The goal for unsupervised learning is to model the underlying structure or distribution                           
in the data in order to learn more about the data. 
These are called unsupervised learning because unlike supervised learning above there is no                         
correct answers and there is no teacher. Algorithms are left to their own devices to discover and                                 
present the interesting structure in the data. 
3 
 
 
The following solutions are classified under this category: 
● Fraud Detection from weird transaction 
● Clustering students into types based on learning styles 
Unsupervised learning problems can be further grouped into clustering and association 
problems. 
● Clustering: A clustering problem is where you want to discover the inherent groupings in                           
the data, such as grouping customers by purchasing behavior 
● Association: An association run learning problem is where you want to discover rules that                           
describe large portions of your data, such as people that buy X also tend to buy Y. 
Some popular examples of unsupervised learning algorithms are: 
● K-means for clustering problems 
● Apriori algorithm for association rule learning problems. 
Quiz ​Classify the following as either supervised or unsupervised learning: 
● Spam detection in emails 
● Fraud detection in transactions 
● Customer segmentation 
● Speech recognition 
● Weather forecast 
● House price prediction 
● Astronomy prediction 
 
Types of Data - Continuous and Discrete Data 
There exist a wide range of data format that will be encountered during data collection, and 
sanitization from numerical, categorical, time series and text base data. 
Quiz ​What type of data type is: 
● CPE508 Result 
● List of courses offered in 500Level - Computer Science and Engineering 
● Gender 
● Frequency of Strike actions in O.A.U 
● Lectures time table 
4 
 
 
Data Overfitting and Underfitting 
In machine learning we describe the learning of the target function from training data as inductive                               
learning. Induction refers to learning general concepts from specific examples which is exactly                         
the problem that supervised machine learning problems aim to solve. This is different from                           
deduction that is the other way around and seeks to learn specific concepts from general rules. 
In statistics, a fit refers to how well you approximate a target function. This is good terminology to                                   
use in machine learning, because supervised machine learning algorithms seek to approximate                       
the unknown underlying mapping function for the output variables given the input variables. 
Overfitting happens when a model learns the detail and noise in the training data to the extent                                 
that it negatively impacts the performance on the model on new data. This means that the noise                                 
or random fluctuations in the training data is picked up and learned as concepts by the model. 
Underfitting refers to a model that can neither model the training data not generalize to new                               
data. An underfit machine learning model is not suitable model and will be obvious as it will have                                   
poor performance on the training data. Underfitting is often not discussed as it is easy to detect                                 
given a good performance metric. The remedy is to move on and try alternative machine learning                               
algorithms. Nevertheless, it does provide good contrast to the problem of overfitting. 
Outlier is an observation that lies in an abnormal distance from other values in a random sample                                 
from a population.   
 
5 
 
 
NB: Clustering analysis is the task of grouping a set of objects in such a way that objects in the                                       
same group (called a cluster) are more similar (in some sense or another) to each other than to                                   
those in other groups (clusters) 
 
 
Quiz ​Identify the outlier in the visualized data below; ​1, 2​ or ​3​: 
 
 
 
Enough of theoretical exposition, Let’s go practical… 
 
6 
 
 
Scikit Learn Usage in ML 
Scikit Learn (otherwise known as Sk-Learn) is an open source machine learning library for python                             
developer. It encapsulate various classification, regression and clustering algorithms including                   
support vector machines, random forest, gradient boosting, k-means and DBSCAN. It’s enhanced                       
with data visualization tool which can be used with other separate python module like pandas. 
The focus of this section is to understand how the library works for classification problems with                               
the following algorithms in mind: 
● Support Vector Machines (for classification problems) - LinearSVC 
● Gaussian Naive Bayes 
● Decision Trees 
 
Support Vector Machines (SVM) 
SVMs contain a set of supervised learning methods used for classification, regression and                         
outliers detection. The focus here is to use it strictly on classification problems. Advantages of                             
SVMs are: 
- very effective in high dimensional spaced data set 
- uses a subset of training points in the decision function, so it’s memory efficient 
 
 
 
 
 
 
 
 
 
 
7 
 
 
Example of Linear SVC implementation: 
Learn more here: 
http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC 
 
Gaussian Naive Bayes 
Naive Bayes methods basically applies Baye’s theorems with the “naive” assumption of                       
independence between every pair of features. Advantages of Naive Bayes algorithm are: 
- worked well in real-world situations like spam filtering 
- requires a small amount of training data to estimate the necessary parameters 
 
Example of Gaussian Naive Bayes implementation: 
 
8 
 
 
Learn more here: 
http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bay
es.GaussianNB 
 
 
Decision Trees 
Decision Trees (DTs) are a non-parametric supervised learning methods which creates a model                         
that predicts the values of a target variable by learning simple decision rules inferred from the                               
data features. Advantages of Decision Trees algorithm are: 
- simple to understand and interpret 
- Requires little data preparation 
 
Example of Decision Tree Classifier implementation: 
 
Learn more here: 
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.Deci
sionTreeClassifier 
 
 
 
 
 
9 
 
 
 
Next Plan 
Kindly create an account on Microsoft Azure ML Platform: 
https://studio.azureml.net/ 
 
10 

Más contenido relacionado

La actualidad más candente

Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersSatyam Jaiswal
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine LearningJoel Graff
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learningZAMANCHBWN
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive ModellingAmit Kumar
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion antimo musone
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationTara ram Goyal
 

La actualidad más candente (17)

Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
detailed Presentation on supervised learning
 detailed Presentation on supervised learning detailed Presentation on supervised learning
detailed Presentation on supervised learning
 
Supervised learning
  Supervised learning  Supervised learning
Supervised learning
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion Tech meetup Data Driven - Codemotion
Tech meetup Data Driven - Codemotion
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
 

Similar a Introduction to machine learning

Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxiaeronlineexm
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applicationsBenjaminlapid1
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine LearningVedaj Padman
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!To Sum It Up
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSujith Jayaprakash
 
machinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfmachinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfPranavPatil822557
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
 
Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training pptHRJEETSINGH
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Unit-V Machine Learning.ppt
Unit-V Machine Learning.pptUnit-V Machine Learning.ppt
Unit-V Machine Learning.pptSharpmark256
 
Chapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxssuser957b41
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 

Similar a Introduction to machine learning (20)

Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
machinecanthink-160226155704.pdf
machinecanthink-160226155704.pdfmachinecanthink-160226155704.pdf
machinecanthink-160226155704.pdf
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Industrial training ppt
Industrial training pptIndustrial training ppt
Industrial training ppt
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Unit-V Machine Learning.ppt
Unit-V Machine Learning.pptUnit-V Machine Learning.ppt
Unit-V Machine Learning.ppt
 
Chapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptxChapter 05 Machine Learning.pptx
Chapter 05 Machine Learning.pptx
 
Machine Learning by Rj
Machine Learning by RjMachine Learning by Rj
Machine Learning by Rj
 
ML crash course
ML crash courseML crash course
ML crash course
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 

Más de Adetimehin Oluwasegun Matthew

Más de Adetimehin Oluwasegun Matthew (6)

Distributed Systems in Data Engineering
Distributed Systems in Data EngineeringDistributed Systems in Data Engineering
Distributed Systems in Data Engineering
 
Personal Branding - Necessity for DevOps Engineers
Personal Branding - Necessity for DevOps EngineersPersonal Branding - Necessity for DevOps Engineers
Personal Branding - Necessity for DevOps Engineers
 
Relevance of academics to Industry
Relevance of academics to IndustryRelevance of academics to Industry
Relevance of academics to Industry
 
Choosing a Careeer in Information Technology
Choosing a Careeer in Information TechnologyChoosing a Careeer in Information Technology
Choosing a Careeer in Information Technology
 
Engineering Data Pipeline for Data-Driven Analytics
Engineering Data Pipeline for Data-Driven AnalyticsEngineering Data Pipeline for Data-Driven Analytics
Engineering Data Pipeline for Data-Driven Analytics
 
Becoming a world class engineer
Becoming a world class engineerBecoming a world class engineer
Becoming a world class engineer
 

Último

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf203318pmpc
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 

Último (20)

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf22-prompt engineering noted slide shown.pdf
22-prompt engineering noted slide shown.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

Introduction to machine learning

  • 1.     Lesson One  Introduction to Machine Learning   - High Level Overview   By: Oluwasgun Matthew & Abdulrazzaq Olajide   Summary  1. Introduction to Concept of Data Analytics and Machine Learning  a. Data Mining and Statistical Pattern Recognition  b. Supervised and Unsupervised Classification/Learning  2. Types of Data - Continuous and Discrete Data  3. Insight on Data Overfitting and Underfitting  a. Introducing Outliers  4. Scikit Learn usage in ML  a. Support Vector Machine  b. Gaussian Naive Bayes  c. Decision Trees    Let’s Dive In..        1 
  • 2.     Introduction - Concept of Data Analytics and Machine Learning  In a world of data explosion, rate of data generation and consumption is on the increasing side,  there comes the buzzword - Big Data.  Big Data is the concept of fast moving, large volume data in varying dimensions (sources) and  highly unpredicted sources.  The 4Vs of Big Data  ● Volume - Scale of Data  ● Velocity - Analysis of Streaming Data  ● Variety - Different forms of Data  ● Veracity - Uncertainty of Data  With increasing data availability, the new trend in the industry demands not just data collection,  but making ample sense of acquired data - thereby, the concept of Data Analytics.   Taking it a step further to further make futuristic prediction and realistic inferences - the concept  of Machine Learning.   A blend of both gives a robust analysis of data for the past, now and the future.  There is a thin line between data analytics and Machine learning which becomes very obvious  when you dig deep.  Data Mining  Data collection can be achieved either from static offline data generated from existing platforms  or real-life data source in from of a stream.  Pattern recognition in data is key to machine learning, finding relationship between features,  labels and/or attributes of data set.  For example, classification of animals into mammals and reptiles is solely dependent on physical  attributes of animal set in consideration.  Supervised and Unsupervised Learning  Supervised learning ​is concerned with model or function generation from labeled data set.  Making future inference based on existing predefined information about data attributes.  2 
  • 3.     It’s a learning model where you have input variables (X) and an output variable (Y) and you use an                                      algorithm to learn the mapping function from the input to the output. The goal is to approximate                                  the mapping function so well that when you have new input data (X) that you can predict the                                    output variables (Y) for that data.  Y = f(X)  It’s is called supervised learning because the process of an algorithm learning from the training                              dataset can be thought of as a teacher supervising the learning process. We know the correct                                answers, the algorithm iteratively makes predictions on the training data and is corrected by the                              teacher. The Learning stops when the algorithm achieves an acceptable level of performance.  A lot of machine learning project is centered around this as it’s easier than unsupervised, In this                                  regard, there exist solutions like:  ● Recommender Systems  ● Prediction Engines  ● Image Recognition from Tagged Attributes  ● Time series prediction  Supervised learning problems can be further grouped into regression and classification problems  ● Classification: a classification problem is when the output variable is a category, such as                            “red” and “blue” or “disease” and “no disease” or “purchase” and “no purchase”  ● Regression: a regression problem is when the output variable is real value, such as                            “weight”, “spend power”, “time of best billing”  Some popular examples of supervised machine learning algorithms are:  ● Linear regression for regression problems  ● Random forest for classification and regression problems  ● Support vector machines for classification problems  Unsupervised learning tries to deduce inference from unlabeled data, i.e. no prior knowledge of                            attributes definition/classification.   Unsupervised learning is where you only have input data (X) and no corresponding output                            variables. The goal for unsupervised learning is to model the underlying structure or distribution                            in the data in order to learn more about the data.  These are called unsupervised learning because unlike supervised learning above there is no                          correct answers and there is no teacher. Algorithms are left to their own devices to discover and                                  present the interesting structure in the data.  3 
  • 4.     The following solutions are classified under this category:  ● Fraud Detection from weird transaction  ● Clustering students into types based on learning styles  Unsupervised learning problems can be further grouped into clustering and association  problems.  ● Clustering: A clustering problem is where you want to discover the inherent groupings in                            the data, such as grouping customers by purchasing behavior  ● Association: An association run learning problem is where you want to discover rules that                            describe large portions of your data, such as people that buy X also tend to buy Y.  Some popular examples of unsupervised learning algorithms are:  ● K-means for clustering problems  ● Apriori algorithm for association rule learning problems.  Quiz ​Classify the following as either supervised or unsupervised learning:  ● Spam detection in emails  ● Fraud detection in transactions  ● Customer segmentation  ● Speech recognition  ● Weather forecast  ● House price prediction  ● Astronomy prediction    Types of Data - Continuous and Discrete Data  There exist a wide range of data format that will be encountered during data collection, and  sanitization from numerical, categorical, time series and text base data.  Quiz ​What type of data type is:  ● CPE508 Result  ● List of courses offered in 500Level - Computer Science and Engineering  ● Gender  ● Frequency of Strike actions in O.A.U  ● Lectures time table  4 
  • 5.     Data Overfitting and Underfitting  In machine learning we describe the learning of the target function from training data as inductive                                learning. Induction refers to learning general concepts from specific examples which is exactly                          the problem that supervised machine learning problems aim to solve. This is different from                            deduction that is the other way around and seeks to learn specific concepts from general rules.  In statistics, a fit refers to how well you approximate a target function. This is good terminology to                                    use in machine learning, because supervised machine learning algorithms seek to approximate                        the unknown underlying mapping function for the output variables given the input variables.  Overfitting happens when a model learns the detail and noise in the training data to the extent                                  that it negatively impacts the performance on the model on new data. This means that the noise                                  or random fluctuations in the training data is picked up and learned as concepts by the model.  Underfitting refers to a model that can neither model the training data not generalize to new                                data. An underfit machine learning model is not suitable model and will be obvious as it will have                                    poor performance on the training data. Underfitting is often not discussed as it is easy to detect                                  given a good performance metric. The remedy is to move on and try alternative machine learning                                algorithms. Nevertheless, it does provide good contrast to the problem of overfitting.  Outlier is an observation that lies in an abnormal distance from other values in a random sample                                  from a population.      5 
  • 6.     NB: Clustering analysis is the task of grouping a set of objects in such a way that objects in the                                        same group (called a cluster) are more similar (in some sense or another) to each other than to                                    those in other groups (clusters)      Quiz ​Identify the outlier in the visualized data below; ​1, 2​ or ​3​:        Enough of theoretical exposition, Let’s go practical…    6 
  • 7.     Scikit Learn Usage in ML  Scikit Learn (otherwise known as Sk-Learn) is an open source machine learning library for python                              developer. It encapsulate various classification, regression and clustering algorithms including                    support vector machines, random forest, gradient boosting, k-means and DBSCAN. It’s enhanced                        with data visualization tool which can be used with other separate python module like pandas.  The focus of this section is to understand how the library works for classification problems with                                the following algorithms in mind:  ● Support Vector Machines (for classification problems) - LinearSVC  ● Gaussian Naive Bayes  ● Decision Trees    Support Vector Machines (SVM)  SVMs contain a set of supervised learning methods used for classification, regression and                          outliers detection. The focus here is to use it strictly on classification problems. Advantages of                              SVMs are:  - very effective in high dimensional spaced data set  - uses a subset of training points in the decision function, so it’s memory efficient                      7 
  • 8.     Example of Linear SVC implementation:  Learn more here:  http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html#sklearn.svm.LinearSVC    Gaussian Naive Bayes  Naive Bayes methods basically applies Baye’s theorems with the “naive” assumption of                        independence between every pair of features. Advantages of Naive Bayes algorithm are:  - worked well in real-world situations like spam filtering  - requires a small amount of training data to estimate the necessary parameters    Example of Gaussian Naive Bayes implementation:    8 
  • 9.     Learn more here:  http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bay es.GaussianNB      Decision Trees  Decision Trees (DTs) are a non-parametric supervised learning methods which creates a model                          that predicts the values of a target variable by learning simple decision rules inferred from the                                data features. Advantages of Decision Trees algorithm are:  - simple to understand and interpret  - Requires little data preparation    Example of Decision Tree Classifier implementation:    Learn more here:  http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.Deci sionTreeClassifier            9 
  • 10.       Next Plan  Kindly create an account on Microsoft Azure ML Platform:  https://studio.azureml.net/    10