SlideShare una empresa de Scribd logo
1 de 110
SKILL TITLE
S2 NIGHT SKILL TRAINING PROGRAM
MACHINE LEARNING USING
PYTHON
DAY 1
INTRODUCTION TO MACHINE LEARNING,
DATA COLLECTION AND STUDY ABOUT DATASET
INTRODUCTION TO ML
● GOOGLE DEF :
Machine learning (ML) is a subfield of artificial intelligence focused on
training machine learning algorithms with data sets to produce machine learning
models capable of performing complex tasks, such as sorting images, forecasting sales, or
analyzing big data.
● MOST PREFERRED DEF :
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P , if its performance at tasks T, as measured
by P , improves with experience E.
AI vs ML vs DL :
FOR MORE INFO :
https://pythongeeks.org/ai-vs-data-science-vs-deep-learning-vs-ml/
IMPORTANCE OF MACHINE LEARNING :
APPLICATIONS OF ML :
WHAT NOT ??
……COULD DO ANYTHING :)
TASK 1 : LIST A FEW IN 1 MIN
DATASET :
1. Kaggle
2. UCI Machine Learning Repository
3. Data.gov
4. Google Dataset Search
5. World Bank Open Data
6. OpenML
7. Data.gov.uk
8. Reddit Datasets
TASK 2 : DOWNLOAD A DATASET AND UPDATE
http://www.cs.cmu.edu/~aharley/nn_vis/cnn/3d.html
3D CNN VISUALIZATION
DAY 2
INTRODUCTION TO RAPID MINER AND DATA
PREPROCESSING
GOOGLE COLABORATORY :
Google Colab is a Jupyter notebook environment that runs completely
on a cloud. It handles all the setup and configuration required for your program.
So that you can start writing your first program.
Colaboratory, or “Colab” for short, is a product from Google
Research. Colaballows anybody to write and execute arbitrary python code
through the browser, and is especially well suited to machine learning, data
analysis and education.
How to run a code in Google Colab?
Running code in Google Colab is as easy as opening any website.
It requires just 2 steps.
1. Sign into Google colab.
2. Create a new notebook.
To Sign in to google colab, you need to go to Google Colab url:
https://colab.research.google.com
The home page of the Google Colab looks like below:
You can create/open a notebook from:
1.A recent notebook you have created.
2.A notebook you have saved in Google drive.
3.Cloning of notebook from the git repositories .
4.Upload from the local storage.
5.Or simply create a new one from the Colab itself.
Creating a new notebook in the Colab:
1.Select a new notebook option from pop up window shown in the above
picture. However, you can create a new notebook by going to the file menu
and select “New notebook”.
2. You can change the name of the notebook by double clicking on the file name
at the top left near google drive logo. However, donot change the extension of
the file. The notebook always should have the “ipynb” extension.
3. You are ready to write your python code. There are two option in the main page to write
your code or text.
a) Text is for the information or the description of the code. It is just for display.
b) Code is where you write your code.
You can hover in the center of the screen to get the option to write code or text as show in
figure below.
4. Now you can start writing your code in Google Colab.
5. Click on the play button at the left side of the code editor to run your
program. You can create multiple code editors as shown below.
DATA PREPROCESSING :
1. Data Preprocessing
2. Importing the libraries
3. Importing Dataset
4. Handling Missing Data
5. Encoding Categorical Data
6. Encoding independent variables
7. Encoding dependent variables
8. Splitting data into Test set & Training Set
9. Feature Scaling
TASK 3 of D2 : CREATE A COLAB NOTEBOOK OF OWN
DAY 3
INTRODUCTION TO PYTHON AND
REVISION OF CONCEPTS
WHY PYTHON ??
•Large Set of Libraries
•Code Simplicity
•Platform Independence
•Community Support
•Visualization Ability
•Flexibility
PYTHON CONCEPTS :
1. LIST
2. TUPLE
3. SET
4. DICTIONARY
LIST FOR PRACTICE :
https://drive.google.com/drive/folders/1dllsH2PLBR9Cn3z0yDDxxa44tUgMlzj?usp=sharing
Here are some of the most commonly used basic Python libraries in AI and ML:
•Pandas for general-purpose data analysis
•NumPy for high-performance scientific calculation and data analysis
•TensorFlow for high-performance numerical computation
•SciPy for advanced computation
•Scikit-learn for handling ML algorithms (like clustering, decision trees, linear and
logistic regressions, and classification)
•Keras for building and designing a neural network
•Matplotlib for data visualization (i.e. developing 2D plots, histograms, charts, and
other forms of visualization)
•Natural Language Toolkit (NLTK) for working with computational linguistics and
natural language recognition& processing
•Scikit-image for image processing& analysis
•PyBrain for implementing machine learning algorithms and architectures ranging from
areas such as supervised learning and reinforcement learning
•StatsModels for data exploration, statistical model estimation, and performing
statistical tests
PRACTICE PRACTIEC PRACTICE !!!!
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
DAY 4
DATA PREPROCESSING USING
PYTHON
Data preprocessing is a process of preparing the raw data and making it
suitable for a machine learning model. It is the first and crucial step while
creating a machine learning model.
When creating a machine learning project, it is not always a case that we
come across the clean and formatted data. And while doing any operation
with data, it is mandatory to clean it and put in a formatted way. So for
this, we use data preprocessing task.
WHAT IS DATA PREPROCESS
PRACTICE PRACTIEC PRACTICE !!!!
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
DAY 5
MODEL EVALUATION AND
SELECTION LINEAR REGRESSION
CONFUSION MATRIX :
Linear regression algorithm shows a linear relationship between a dependent
(y) and one or more independent (y) variables, hence called as linear
regression. Since linear regression shows the linear relationship, which
means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
LINEAR REGRESSION :
Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression
model representation.
Types of Linear Regression
Linear regression can be further divided into two types of the
algorithm:
1. SIMPLE LINEAR REGRESSION :
If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression
algorithm is called Simple Linear Regression.
2. MULTIPLE LINEAR REGRESSION :
If more than one independent variable is used to predict the value
of a numerical dependent variable, then such a Linear Regression
algorithm is called Multiple Linear Regression.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 6
SIMPLE AND MULTIPLE LINEAR
REGRESSION
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 7
DECISION TREE REGRESSION
WHAT IS DECISION TREE ALGORITHM ?
Decision tree builds regression or classification models in the
form of a tree structure. It breaks down a dataset into smaller and smaller
subsets while at the same time an associated decision tree is
incrementally developed. The final result is a tree with decision
nodes and leaf nodes. A decision node (e.g., Outlook) has two or more
branches (e.g., Sunny, Overcast and Rainy), each representing values for
the attribute tested. Leaf node (e.g., Hours Played) represents a decision
on the numerical target. The topmost decision node in a tree which
corresponds to the best predictor called root node. Decision trees can
handle both categorical and numerical data.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 8
SUPPORT VECTOR MACHINE REGRESSION
Support Vector Machines
● A Support Vector Machine (SVM) is a classifier that
tries to maximize the margin between training data
and the classification boundary (the plane defined by
𝑋𝛽 = 0)
Support Vector Machines
● The idea is that maximizing the margin maximizes
the chance that classification will be correct on
new data. We assume the new data of each class is
near the training data of that type.
SVM Training
SVMs can be trained using SGD. Recall that the Logistic
gradient was (this time assuming 𝒚𝒊 ∈ −𝟏, +𝟏 ):
𝑑𝐴
𝑑𝛽
=
𝑖=1
𝑁
𝑦𝑖𝑝𝑖 1 − 𝑝𝑖 𝑋𝑖
The SVM gradient can be defined as (here 𝑝𝑖 = 𝑋𝑖𝛽)
𝑑𝐴
𝑑𝛽
=
𝑖=1
𝑁
if 𝑝𝑖𝑦𝑖 < 1 then 𝑦𝑖𝑋𝑖 else 0
The expression 𝑝𝑖𝑦𝑖 < 1 tests whether the point 𝑋𝑖 is in
the margin, and if so adds it with sign 𝑦𝑖
. It ignores
other points.
Both methods weight points “near the middle” with
sign 𝒚𝒊
.
SVM Training
This SGD training method (called Pegasos) is much faster
and competitive with Logistic Regression.
Its also capable of training in less than one pass over a
dataset.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
DAY 9
LOGISTIC REGRESSION
Logistic Regression
● Logistic regression is probably the most widely used
general-purpose classifier.
● Its very scalable and can be very fast to train. It’s
used for
○ Spam filtering
○ News message classification
○ Web site classification
○ Product classification
○ Most classification problems with large, sparse
feature sets.
● The only caveat is that it can overfit on very sparse
data, so its often used with Regularization
Logistic Regression
● NOTE : Regression (predicting a real value) and
classification (predicting a discrete value).
● Logistic regression is designed as a binary classifier
(output say {0,1}) but actually outputs the probability
that the input instance is in the “1” class.
● A logistic classifier has the form:
𝑝 𝑋 =
1
1 + exp −𝑋𝛽
where 𝑋 = 𝑋1, … , 𝑋𝑛 is a vector of features.
Logistic Regression
● Logistic regression maps the “regression” value −𝑋𝛽
in
(-,) to the range [0,1] using a “logistic” function:
𝑝 𝑋 =
1
1 + exp −𝑋𝛽
● i.e. the logistic function maps any value on the real
line to a probability in the range [0,1]
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 10
INTRODUCTION TO CLASSIFICATION AND DIFFERENT
TYPES OF CLASSIFICATION ALGORITHMS AND
MODEL SELECTION
INTRODUCTION TO CLASSIFICATION :
Classification may be defined as the process of predicting
class or category from observed values or given data points. The
categorized output can have the form such as “Black” or “White”
or “spam” or “no spam”.
Mathematically, classification is the task of
approximating a mapping function (f) from input variables (X) to
output variables (Y). It is basically belongs to the supervised
machine learning in which targets are also provided along with the
input data set.
TYPES OF LEARNERS IN CLASSIFICATION :
We have two types of learners in respective to classification
problems −
1. Lazy Learners
As the name suggests, such kind of learners waits for the testing data to be
appeared after storing the training data. Classification is done only after
getting the testing data. They spend less time on training but more time on
predicting. Examples of lazy learners are K-nearest neighbor and case-based
reasoning.
2. Eager Learners
As opposite to lazy learners, eager learners construct classification model
without waiting for the testing data to be appeared after storing the training
data. They spend more time on training but less time on predicting. Examples
of eager learners are Decision Trees, Naïve Bayes and Artificial Neural
Networks (ANN).
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 11
NAÏVE BAYES ALGORITHM
•Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
•It is mainly used in text classification that includes a high-dimensional
training dataset.
•It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
•Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features. Such
as if the fruit is identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without
depending on each other.
•Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
Bayes’ Theorem
P(A|B) = probability of A given that B is true.
P(A|B) =
In practice we are most interested in dealing with events e
and data D.
e = “I have a cold”
D = “runny nose,” “watery eyes,” “coughing”
P(e|D)=
So Bayes’ theorem is “diagnostic”.
P(B|A)P(A)
P(B)
P(D|e)P(e)
P(D)
Bayes’ Theorem
Bayes’ Theorem
D = Data, e = some event
P(e|D) =
P(e) is called the prior probability of e. Its what we know (or
think we know) about e with no other evidence.
P(D|e) is the conditional probability of D given that e
happened, or just the likelihood of D. This can often be
measured or computed precisely – it follows from your
model assumptions.
P(e|D) is the posterior probability of e given D. It’s the
answer we want, or the way we choose a best answer.
You can see that the posterior is heavily colored by the prior,
so Bayes’ has a GIGO liability. e.g. its not used to test
hypotheses
P(D|e)P(e)
P(D)
Naïve Bayes Classifier
Let’s assume we have an instance (e.g. a document d) with
a set of features 𝑋1, … , 𝑋𝑘 and a set of classes 𝑐𝑗 to
which the document might belong.
We want to find the most likely class that the document
belongs to, given its features.
The joint probability of the class and features is:
Pr 𝑋1, … , 𝑋𝑘, 𝑐𝑗
Naïve Bayes Classifier
Key Assumption: (Naïve) the features are generated
independently given 𝑐𝑗. Then the joint probability factors:
Pr 𝑋, 𝑐𝑗 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗
𝑖=1
𝑘
Pr 𝑋𝑖|𝑐𝑗
We would like to figure out the most likely class for (i.e. to
classify) the document, which is the 𝑐𝑗 which maximizes:
Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘
Naïve Bayes Classifier
Now from Bayes we know that:
Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 / Pr 𝑋1, … , 𝑋𝑘
But to choose the best 𝑐𝑗, we can ignore Pr 𝑋1, … , 𝑋𝑘 since
it’s the same for every class. So we just have to maximize:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗
So finally we pick the category 𝑐𝑗 that maximizes:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗
𝑖=1
𝑘
Pr 𝑋𝑖|𝑐𝑗
Naïve Bayes Classifier
Now from Bayes we know that:
Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 / Pr 𝑋1, … , 𝑋𝑘
But to choose the best 𝑐𝑗, we can ignore Pr 𝑋1, … , 𝑋𝑘 since
it’s the same for every class. So we just have to maximize:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗
So finally we pick the category 𝑐𝑗 that maximizes:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗
𝑖=1
𝑘
Pr 𝑋𝑖|𝑐𝑗
A A A
B B B
Data for Naïve Bayes
In order to find the best class, we need two pieces of data:
• Pr 𝑐𝑗 the prior probability for the class 𝑐𝑗.
• Pr 𝑋𝑖|𝑐𝑗 the conditional probability of the feature 𝑋𝑖
given the class 𝑐𝑗.
Advantage and Disadvantage of NB Classifiers
● Simple and fast. Depend only on term frequency data
for the classes. One shot, no iteration.
● Very well-behaved numerically. Term weight
depends only on frequency of that term. Decoupled
from other terms.
● Can work very well with sparse data, where
combinations of dependent terms are rare.
● Subject to error and bias when term probabilities are
not independent (e.g. URL prefixes).
● Can’t model patterns in the data.
● Typically not as accurate as other methods.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 12
K –NEAREST NEIGHBOURS ALGORITHM
•K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
•K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into the
category that is most similar to the available categories.
•K-NN algorithm stores all the available data and classifies a new
data point based on the similarity. This means when new data
appears then it can be easily classified into a well suite category
by using K- NN algorithm.
K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data.
•It is also called a lazy learner algorithm because it does not
learn from the training set immediately instead it stores the
dataset and at the time of classification, it performs an action on
the dataset.
•Example: Suppose, we have an image of a creature
that looks similar to cat and dog, but we want to know
either it is a cat or dog. So for this identification, we can
use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of
the new data set to the cats and dogs images and
based on the most similar features it will put it in either
cat or dog category.
The K-NN working can be explained on the basis of the
below algorithm:
•Step-1: Select the number K of the neighbors
•Step-2: Calculate the Euclidean distance of K number
of neighbors
•Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
•Step-4: Among these k neighbors, count the number of
the data points in each category.
•Step-5: Assign the new data points to that category for
which the number of the neighbor is maximum.
•Step-6: Our model is ready.
How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the value of K in the
K-NN algorithm:
•There is no particular way to determine the best value for "K", so we
need to try some values to find the best out of them. The most preferred
value for K is 5.
•A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
•Large values for K are good, but it may find some difficulties.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
DAY 13
DECISION TREE CLASSIFICATION
•Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems, but mostly
it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
•It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.
Example: Suppose there is a candidate who has a job offer and
wants to decide whether he should accept the offer or Not. So, to
solve this problem, the decision tree starts with the root node
(Salary attribute by ASM). The root node splits further into the next
decision node (distance from the office) and one leaf node based
on the corresponding labels. The next decision node further gets
split into one decision node (Cab facility) and one leaf node. Finally,
the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 14
SUPPORT VECTOR MACHINES CLASSIFICATION
Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is
used for Classification as well as Regression
problems. However, primarily, it is used for
Classification problems in Machine Learning.
“The goal of the SVM algorithm is to create the best
line or decision boundary that can segregate n-
dimensional space into classes so that we can
easily put the new data point in the correct category
in the future. This best decision boundary is called a
hyperplane.”
Example: SVM can be understood with the example that
we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat
or dog, so such a model can be created by using the SVM
algorithm. We will first train our model with lots of images of
cats and dogs so that it can learn about different features of
cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary
between these two data (cat and dog) and choose extreme
cases (support vectors), it will see the extreme case of cat
and dog. On the basis of the support vectors, it will classify
it as a cat. Consider the below diagram:
Types of SVM
SVM can be of two types:
•Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
•Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
“SVM algorithm can be used for Face detection, image classification,
text categorization, etc”.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 15
INTRODUCTION TO CLUSTERING AND K- MEANS
CLUSTERING
Clustering in Machine Learning :
Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points. The objects with the
possible similarities remain in a group that has less or no similarities with
another group."
What is K-Means Algorithm?
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in
the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on.
It is an iterative algorithm that divides the unlabeled dataset into
k different clusters in such a way that each dataset belongs only one
group that has similar properties.
It is a centroid-based algorithm, where each cluster is associated
with a centroid. The main aim of this algorithm is to minimize the
sum of distances between the data point and their corresponding clusters.
The k-means clustering algorithm mainly performs two tasks:
•Determines the best value for K center points or centroids by an iterative process.
•Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
DAY 16
HIERARCHICAL CLUSTERING
Hierarchical clustering is another unsupervised machine learning algorithm,
which is used to group the unlabeled datasets into a cluster and also known
as hierarchical cluster analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and
this tree-shaped structure is known as the dendrogram.
The hierarchical clustering technique has two approaches:
1.Agglomerative: Agglomerative is a bottom-up approach, in which the
algorithm starts with taking all data points as single clusters and merging them
until one cluster is left.
2.Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it
is a top-down approach.
DENDOGRAM
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
THANK YOU !!!

Más contenido relacionado

Similar a S2 NIGHT SKILL.pptx

MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 

Similar a S2 NIGHT SKILL.pptx (20)

Ml programming with python
Ml programming with pythonMl programming with python
Ml programming with python
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
Lect1.pptx
Lect1.pptxLect1.pptx
Lect1.pptx
 
Using Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS ModelerUsing Apache Spark with IBM SPSS Modeler
Using Apache Spark with IBM SPSS Modeler
 
Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...Creating a custom Machine Learning Model for your applications - Java Dev Day...
Creating a custom Machine Learning Model for your applications - Java Dev Day...
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Neural networks with python
Neural networks with pythonNeural networks with python
Neural networks with python
 
Basic of python for data analysis
Basic of python for data analysisBasic of python for data analysis
Basic of python for data analysis
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx8.unit-1-fds-2022-23.pptx
8.unit-1-fds-2022-23.pptx
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 

Último

scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Último (20)

Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects2016EF22_0 solar project report rooftop projects
2016EF22_0 solar project report rooftop projects
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 

S2 NIGHT SKILL.pptx

  • 1. SKILL TITLE S2 NIGHT SKILL TRAINING PROGRAM
  • 3. DAY 1 INTRODUCTION TO MACHINE LEARNING, DATA COLLECTION AND STUDY ABOUT DATASET
  • 4. INTRODUCTION TO ML ● GOOGLE DEF : Machine learning (ML) is a subfield of artificial intelligence focused on training machine learning algorithms with data sets to produce machine learning models capable of performing complex tasks, such as sorting images, forecasting sales, or analyzing big data. ● MOST PREFERRED DEF : A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks T, as measured by P , improves with experience E.
  • 5. AI vs ML vs DL : FOR MORE INFO : https://pythongeeks.org/ai-vs-data-science-vs-deep-learning-vs-ml/
  • 6.
  • 7.
  • 8.
  • 10.
  • 11. APPLICATIONS OF ML : WHAT NOT ?? ……COULD DO ANYTHING :) TASK 1 : LIST A FEW IN 1 MIN
  • 12. DATASET : 1. Kaggle 2. UCI Machine Learning Repository 3. Data.gov 4. Google Dataset Search 5. World Bank Open Data 6. OpenML 7. Data.gov.uk 8. Reddit Datasets TASK 2 : DOWNLOAD A DATASET AND UPDATE
  • 14. DAY 2 INTRODUCTION TO RAPID MINER AND DATA PREPROCESSING
  • 15. GOOGLE COLABORATORY : Google Colab is a Jupyter notebook environment that runs completely on a cloud. It handles all the setup and configuration required for your program. So that you can start writing your first program. Colaboratory, or “Colab” for short, is a product from Google Research. Colaballows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education.
  • 16. How to run a code in Google Colab? Running code in Google Colab is as easy as opening any website. It requires just 2 steps. 1. Sign into Google colab. 2. Create a new notebook. To Sign in to google colab, you need to go to Google Colab url: https://colab.research.google.com
  • 17. The home page of the Google Colab looks like below:
  • 18.
  • 19. You can create/open a notebook from: 1.A recent notebook you have created. 2.A notebook you have saved in Google drive. 3.Cloning of notebook from the git repositories . 4.Upload from the local storage. 5.Or simply create a new one from the Colab itself.
  • 20. Creating a new notebook in the Colab: 1.Select a new notebook option from pop up window shown in the above picture. However, you can create a new notebook by going to the file menu and select “New notebook”.
  • 21. 2. You can change the name of the notebook by double clicking on the file name at the top left near google drive logo. However, donot change the extension of the file. The notebook always should have the “ipynb” extension.
  • 22. 3. You are ready to write your python code. There are two option in the main page to write your code or text. a) Text is for the information or the description of the code. It is just for display. b) Code is where you write your code. You can hover in the center of the screen to get the option to write code or text as show in figure below.
  • 23. 4. Now you can start writing your code in Google Colab.
  • 24. 5. Click on the play button at the left side of the code editor to run your program. You can create multiple code editors as shown below.
  • 25. DATA PREPROCESSING : 1. Data Preprocessing 2. Importing the libraries 3. Importing Dataset 4. Handling Missing Data 5. Encoding Categorical Data 6. Encoding independent variables 7. Encoding dependent variables 8. Splitting data into Test set & Training Set 9. Feature Scaling TASK 3 of D2 : CREATE A COLAB NOTEBOOK OF OWN
  • 26. DAY 3 INTRODUCTION TO PYTHON AND REVISION OF CONCEPTS
  • 28. •Large Set of Libraries •Code Simplicity •Platform Independence •Community Support •Visualization Ability •Flexibility
  • 29. PYTHON CONCEPTS : 1. LIST 2. TUPLE 3. SET 4. DICTIONARY LIST FOR PRACTICE : https://drive.google.com/drive/folders/1dllsH2PLBR9Cn3z0yDDxxa44tUgMlzj?usp=sharing
  • 30.
  • 31. Here are some of the most commonly used basic Python libraries in AI and ML: •Pandas for general-purpose data analysis •NumPy for high-performance scientific calculation and data analysis •TensorFlow for high-performance numerical computation •SciPy for advanced computation •Scikit-learn for handling ML algorithms (like clustering, decision trees, linear and logistic regressions, and classification) •Keras for building and designing a neural network •Matplotlib for data visualization (i.e. developing 2D plots, histograms, charts, and other forms of visualization) •Natural Language Toolkit (NLTK) for working with computational linguistics and natural language recognition& processing •Scikit-image for image processing& analysis •PyBrain for implementing machine learning algorithms and architectures ranging from areas such as supervised learning and reinforcement learning •StatsModels for data exploration, statistical model estimation, and performing statistical tests
  • 32. PRACTICE PRACTIEC PRACTICE !!!! LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
  • 33. DAY 4 DATA PREPROCESSING USING PYTHON
  • 34. Data preprocessing is a process of preparing the raw data and making it suitable for a machine learning model. It is the first and crucial step while creating a machine learning model. When creating a machine learning project, it is not always a case that we come across the clean and formatted data. And while doing any operation with data, it is mandatory to clean it and put in a formatted way. So for this, we use data preprocessing task. WHAT IS DATA PREPROCESS
  • 35. PRACTICE PRACTIEC PRACTICE !!!! LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
  • 36. DAY 5 MODEL EVALUATION AND SELECTION LINEAR REGRESSION
  • 38.
  • 39.
  • 40. Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable. LINEAR REGRESSION :
  • 41. Mathematically, we can represent a linear regression as: y= a0+a1x+ ε Here, Y= Dependent Variable (Target Variable) X= Independent Variable (predictor Variable) a0= intercept of the line (Gives an additional degree of freedom) a1 = Linear regression coefficient (scale factor to each input value). ε = random error The values for x and y variables are training datasets for Linear Regression model representation.
  • 42. Types of Linear Regression Linear regression can be further divided into two types of the algorithm: 1. SIMPLE LINEAR REGRESSION : If a single independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Simple Linear Regression. 2. MULTIPLE LINEAR REGRESSION : If more than one independent variable is used to predict the value of a numerical dependent variable, then such a Linear Regression algorithm is called Multiple Linear Regression.
  • 43. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 44. DAY 6 SIMPLE AND MULTIPLE LINEAR REGRESSION
  • 45.
  • 46.
  • 47. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 48. DAY 7 DECISION TREE REGRESSION
  • 49. WHAT IS DECISION TREE ALGORITHM ? Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes. A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy), each representing values for the attribute tested. Leaf node (e.g., Hours Played) represents a decision on the numerical target. The topmost decision node in a tree which corresponds to the best predictor called root node. Decision trees can handle both categorical and numerical data.
  • 50.
  • 51. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 52. DAY 8 SUPPORT VECTOR MACHINE REGRESSION
  • 53. Support Vector Machines ● A Support Vector Machine (SVM) is a classifier that tries to maximize the margin between training data and the classification boundary (the plane defined by 𝑋𝛽 = 0)
  • 54. Support Vector Machines ● The idea is that maximizing the margin maximizes the chance that classification will be correct on new data. We assume the new data of each class is near the training data of that type.
  • 55. SVM Training SVMs can be trained using SGD. Recall that the Logistic gradient was (this time assuming 𝒚𝒊 ∈ −𝟏, +𝟏 ): 𝑑𝐴 𝑑𝛽 = 𝑖=1 𝑁 𝑦𝑖𝑝𝑖 1 − 𝑝𝑖 𝑋𝑖 The SVM gradient can be defined as (here 𝑝𝑖 = 𝑋𝑖𝛽) 𝑑𝐴 𝑑𝛽 = 𝑖=1 𝑁 if 𝑝𝑖𝑦𝑖 < 1 then 𝑦𝑖𝑋𝑖 else 0 The expression 𝑝𝑖𝑦𝑖 < 1 tests whether the point 𝑋𝑖 is in the margin, and if so adds it with sign 𝑦𝑖 . It ignores other points. Both methods weight points “near the middle” with sign 𝒚𝒊 .
  • 56. SVM Training This SGD training method (called Pegasos) is much faster and competitive with Logistic Regression. Its also capable of training in less than one pass over a dataset. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
  • 58. Logistic Regression ● Logistic regression is probably the most widely used general-purpose classifier. ● Its very scalable and can be very fast to train. It’s used for ○ Spam filtering ○ News message classification ○ Web site classification ○ Product classification ○ Most classification problems with large, sparse feature sets. ● The only caveat is that it can overfit on very sparse data, so its often used with Regularization
  • 59. Logistic Regression ● NOTE : Regression (predicting a real value) and classification (predicting a discrete value). ● Logistic regression is designed as a binary classifier (output say {0,1}) but actually outputs the probability that the input instance is in the “1” class. ● A logistic classifier has the form: 𝑝 𝑋 = 1 1 + exp −𝑋𝛽 where 𝑋 = 𝑋1, … , 𝑋𝑛 is a vector of features.
  • 60. Logistic Regression ● Logistic regression maps the “regression” value −𝑋𝛽 in (-,) to the range [0,1] using a “logistic” function: 𝑝 𝑋 = 1 1 + exp −𝑋𝛽 ● i.e. the logistic function maps any value on the real line to a probability in the range [0,1]
  • 61.
  • 62. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 63. DAY 10 INTRODUCTION TO CLASSIFICATION AND DIFFERENT TYPES OF CLASSIFICATION ALGORITHMS AND MODEL SELECTION
  • 64. INTRODUCTION TO CLASSIFICATION : Classification may be defined as the process of predicting class or category from observed values or given data points. The categorized output can have the form such as “Black” or “White” or “spam” or “no spam”. Mathematically, classification is the task of approximating a mapping function (f) from input variables (X) to output variables (Y). It is basically belongs to the supervised machine learning in which targets are also provided along with the input data set.
  • 65. TYPES OF LEARNERS IN CLASSIFICATION : We have two types of learners in respective to classification problems − 1. Lazy Learners As the name suggests, such kind of learners waits for the testing data to be appeared after storing the training data. Classification is done only after getting the testing data. They spend less time on training but more time on predicting. Examples of lazy learners are K-nearest neighbor and case-based reasoning. 2. Eager Learners As opposite to lazy learners, eager learners construct classification model without waiting for the testing data to be appeared after storing the training data. They spend more time on training but less time on predicting. Examples of eager learners are Decision Trees, Naïve Bayes and Artificial Neural Networks (ANN).
  • 66.
  • 67. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 68. DAY 11 NAÏVE BAYES ALGORITHM
  • 69. •Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems. •It is mainly used in text classification that includes a high-dimensional training dataset. •It is a probabilistic classifier, which means it predicts on the basis of the probability of an object. •Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence of other features. Such as if the fruit is identified on the bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature individually contributes to identify that it is an apple without depending on each other. •Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.
  • 70. Bayes’ Theorem P(A|B) = probability of A given that B is true. P(A|B) = In practice we are most interested in dealing with events e and data D. e = “I have a cold” D = “runny nose,” “watery eyes,” “coughing” P(e|D)= So Bayes’ theorem is “diagnostic”. P(B|A)P(A) P(B) P(D|e)P(e) P(D)
  • 72. Bayes’ Theorem D = Data, e = some event P(e|D) = P(e) is called the prior probability of e. Its what we know (or think we know) about e with no other evidence. P(D|e) is the conditional probability of D given that e happened, or just the likelihood of D. This can often be measured or computed precisely – it follows from your model assumptions. P(e|D) is the posterior probability of e given D. It’s the answer we want, or the way we choose a best answer. You can see that the posterior is heavily colored by the prior, so Bayes’ has a GIGO liability. e.g. its not used to test hypotheses P(D|e)P(e) P(D)
  • 73. Naïve Bayes Classifier Let’s assume we have an instance (e.g. a document d) with a set of features 𝑋1, … , 𝑋𝑘 and a set of classes 𝑐𝑗 to which the document might belong. We want to find the most likely class that the document belongs to, given its features. The joint probability of the class and features is: Pr 𝑋1, … , 𝑋𝑘, 𝑐𝑗
  • 74. Naïve Bayes Classifier Key Assumption: (Naïve) the features are generated independently given 𝑐𝑗. Then the joint probability factors: Pr 𝑋, 𝑐𝑗 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗 𝑖=1 𝑘 Pr 𝑋𝑖|𝑐𝑗 We would like to figure out the most likely class for (i.e. to classify) the document, which is the 𝑐𝑗 which maximizes: Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘
  • 75. Naïve Bayes Classifier Now from Bayes we know that: Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 / Pr 𝑋1, … , 𝑋𝑘 But to choose the best 𝑐𝑗, we can ignore Pr 𝑋1, … , 𝑋𝑘 since it’s the same for every class. So we just have to maximize: Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 So finally we pick the category 𝑐𝑗 that maximizes: Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗 𝑖=1 𝑘 Pr 𝑋𝑖|𝑐𝑗
  • 76. Naïve Bayes Classifier Now from Bayes we know that: Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 / Pr 𝑋1, … , 𝑋𝑘 But to choose the best 𝑐𝑗, we can ignore Pr 𝑋1, … , 𝑋𝑘 since it’s the same for every class. So we just have to maximize: Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 So finally we pick the category 𝑐𝑗 that maximizes: Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗 𝑖=1 𝑘 Pr 𝑋𝑖|𝑐𝑗 A A A B B B
  • 77. Data for Naïve Bayes In order to find the best class, we need two pieces of data: • Pr 𝑐𝑗 the prior probability for the class 𝑐𝑗. • Pr 𝑋𝑖|𝑐𝑗 the conditional probability of the feature 𝑋𝑖 given the class 𝑐𝑗.
  • 78. Advantage and Disadvantage of NB Classifiers ● Simple and fast. Depend only on term frequency data for the classes. One shot, no iteration. ● Very well-behaved numerically. Term weight depends only on frequency of that term. Decoupled from other terms. ● Can work very well with sparse data, where combinations of dependent terms are rare. ● Subject to error and bias when term probabilities are not independent (e.g. URL prefixes). ● Can’t model patterns in the data. ● Typically not as accurate as other methods.
  • 79. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 80. DAY 12 K –NEAREST NEIGHBOURS ALGORITHM
  • 81. •K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique. •K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories. •K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm. K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data. •It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.
  • 82. •Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in either cat or dog category.
  • 83.
  • 84. The K-NN working can be explained on the basis of the below algorithm: •Step-1: Select the number K of the neighbors •Step-2: Calculate the Euclidean distance of K number of neighbors •Step-3: Take the K nearest neighbors as per the calculated Euclidean distance. •Step-4: Among these k neighbors, count the number of the data points in each category. •Step-5: Assign the new data points to that category for which the number of the neighbor is maximum. •Step-6: Our model is ready.
  • 85.
  • 86. How to select the value of K in the K-NN Algorithm? Below are some points to remember while selecting the value of K in the K-NN algorithm: •There is no particular way to determine the best value for "K", so we need to try some values to find the best out of them. The most preferred value for K is 5. •A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model. •Large values for K are good, but it may find some difficulties. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
  • 87. DAY 13 DECISION TREE CLASSIFICATION
  • 88. •Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree- structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. •It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions.
  • 89. Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by ASM). The root node splits further into the next decision node (distance from the office) and one leaf node based on the corresponding labels. The next decision node further gets split into one decision node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer). Consider the below diagram:
  • 90.
  • 91. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 92. DAY 14 SUPPORT VECTOR MACHINES CLASSIFICATION
  • 93. Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine Learning. “The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n- dimensional space into classes so that we can easily put the new data point in the correct category in the future. This best decision boundary is called a hyperplane.”
  • 94.
  • 95. Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this strange creature. So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below diagram:
  • 96.
  • 97. Types of SVM SVM can be of two types: •Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. •Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier. “SVM algorithm can be used for Face detection, image classification, text categorization, etc”.
  • 98. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 99. DAY 15 INTRODUCTION TO CLUSTERING AND K- MEANS CLUSTERING
  • 100. Clustering in Machine Learning : Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset. It can be defined as "A way of grouping the data points into different clusters, consisting of similar data points. The objects with the possible similarities remain in a group that has less or no similarities with another group."
  • 101. What is K-Means Algorithm? K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on. It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that has similar properties. It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
  • 102. The k-means clustering algorithm mainly performs two tasks: •Determines the best value for K center points or centroids by an iterative process. •Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster.
  • 103. How does the K-Means Algorithm Work? The working of the K-Means algorithm is explained in the below steps: Step-1: Select the number K to decide the number of clusters. Step-2: Select random K points or centroids. (It can be other from the input dataset). Step-3: Assign each data point to their closest centroid, which will form the predefined Step-4: Calculate the variance and place a new centroid of each cluster. Step-5: Repeat the third steps, which means reassign each datapoint to the new closest Step-6: If any reassignment occurs, then go to step-4 else go to FINISH. Step-7: The model is ready.
  • 104. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!
  • 106. Hierarchical clustering is another unsupervised machine learning algorithm, which is used to group the unlabeled datasets into a cluster and also known as hierarchical cluster analysis or HCA. In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram. The hierarchical clustering technique has two approaches: 1.Agglomerative: Agglomerative is a bottom-up approach, in which the algorithm starts with taking all data points as single clusters and merging them until one cluster is left. 2.Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it is a top-down approach.
  • 108.
  • 109. LINK FOR PRACTICE : https://drive.google.com/drive/folders/1h4- SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing PRACTICE PRACTIEC PRACTICE !!!!