This document provides an overview of a 7-day machine learning training program using Python. Day 1 introduces machine learning concepts and completing tasks related to datasets. Day 2 covers RapidMiner, data preprocessing, and creating a Google Colab notebook. Day 3 reviews Python concepts and provides practice problems. Day 4 focuses on data preprocessing in Python. Day 5 discusses model evaluation, linear regression, and completing practice problems. Days 6-7 cover simple/multiple linear regression and decision tree regression with more practice problems. Days 8-10 cover support vector machine regression, logistic regression, and different classification algorithms. Day 11 explains the Naive Bayes algorithm.
4. INTRODUCTION TO ML
● GOOGLE DEF :
Machine learning (ML) is a subfield of artificial intelligence focused on
training machine learning algorithms with data sets to produce machine learning
models capable of performing complex tasks, such as sorting images, forecasting sales, or
analyzing big data.
● MOST PREFERRED DEF :
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P , if its performance at tasks T, as measured
by P , improves with experience E.
5. AI vs ML vs DL :
FOR MORE INFO :
https://pythongeeks.org/ai-vs-data-science-vs-deep-learning-vs-ml/
15. GOOGLE COLABORATORY :
Google Colab is a Jupyter notebook environment that runs completely
on a cloud. It handles all the setup and configuration required for your program.
So that you can start writing your first program.
Colaboratory, or “Colab” for short, is a product from Google
Research. Colaballows anybody to write and execute arbitrary python code
through the browser, and is especially well suited to machine learning, data
analysis and education.
16. How to run a code in Google Colab?
Running code in Google Colab is as easy as opening any website.
It requires just 2 steps.
1. Sign into Google colab.
2. Create a new notebook.
To Sign in to google colab, you need to go to Google Colab url:
https://colab.research.google.com
19. You can create/open a notebook from:
1.A recent notebook you have created.
2.A notebook you have saved in Google drive.
3.Cloning of notebook from the git repositories .
4.Upload from the local storage.
5.Or simply create a new one from the Colab itself.
20. Creating a new notebook in the Colab:
1.Select a new notebook option from pop up window shown in the above
picture. However, you can create a new notebook by going to the file menu
and select “New notebook”.
21. 2. You can change the name of the notebook by double clicking on the file name
at the top left near google drive logo. However, donot change the extension of
the file. The notebook always should have the “ipynb” extension.
22. 3. You are ready to write your python code. There are two option in the main page to write
your code or text.
a) Text is for the information or the description of the code. It is just for display.
b) Code is where you write your code.
You can hover in the center of the screen to get the option to write code or text as show in
figure below.
23. 4. Now you can start writing your code in Google Colab.
24. 5. Click on the play button at the left side of the code editor to run your
program. You can create multiple code editors as shown below.
25. DATA PREPROCESSING :
1. Data Preprocessing
2. Importing the libraries
3. Importing Dataset
4. Handling Missing Data
5. Encoding Categorical Data
6. Encoding independent variables
7. Encoding dependent variables
8. Splitting data into Test set & Training Set
9. Feature Scaling
TASK 3 of D2 : CREATE A COLAB NOTEBOOK OF OWN
28. •Large Set of Libraries
•Code Simplicity
•Platform Independence
•Community Support
•Visualization Ability
•Flexibility
29. PYTHON CONCEPTS :
1. LIST
2. TUPLE
3. SET
4. DICTIONARY
LIST FOR PRACTICE :
https://drive.google.com/drive/folders/1dllsH2PLBR9Cn3z0yDDxxa44tUgMlzj?usp=sharing
30.
31. Here are some of the most commonly used basic Python libraries in AI and ML:
•Pandas for general-purpose data analysis
•NumPy for high-performance scientific calculation and data analysis
•TensorFlow for high-performance numerical computation
•SciPy for advanced computation
•Scikit-learn for handling ML algorithms (like clustering, decision trees, linear and
logistic regressions, and classification)
•Keras for building and designing a neural network
•Matplotlib for data visualization (i.e. developing 2D plots, histograms, charts, and
other forms of visualization)
•Natural Language Toolkit (NLTK) for working with computational linguistics and
natural language recognition& processing
•Scikit-image for image processing& analysis
•PyBrain for implementing machine learning algorithms and architectures ranging from
areas such as supervised learning and reinforcement learning
•StatsModels for data exploration, statistical model estimation, and performing
statistical tests
32. PRACTICE PRACTIEC PRACTICE !!!!
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
34. Data preprocessing is a process of preparing the raw data and making it
suitable for a machine learning model. It is the first and crucial step while
creating a machine learning model.
When creating a machine learning project, it is not always a case that we
come across the clean and formatted data. And while doing any operation
with data, it is mandatory to clean it and put in a formatted way. So for
this, we use data preprocessing task.
WHAT IS DATA PREPROCESS
35. PRACTICE PRACTIEC PRACTICE !!!!
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
40. Linear regression algorithm shows a linear relationship between a dependent
(y) and one or more independent (y) variables, hence called as linear
regression. Since linear regression shows the linear relationship, which
means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
LINEAR REGRESSION :
41. Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression
model representation.
42. Types of Linear Regression
Linear regression can be further divided into two types of the
algorithm:
1. SIMPLE LINEAR REGRESSION :
If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression
algorithm is called Simple Linear Regression.
2. MULTIPLE LINEAR REGRESSION :
If more than one independent variable is used to predict the value
of a numerical dependent variable, then such a Linear Regression
algorithm is called Multiple Linear Regression.
43. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
49. WHAT IS DECISION TREE ALGORITHM ?
Decision tree builds regression or classification models in the
form of a tree structure. It breaks down a dataset into smaller and smaller
subsets while at the same time an associated decision tree is
incrementally developed. The final result is a tree with decision
nodes and leaf nodes. A decision node (e.g., Outlook) has two or more
branches (e.g., Sunny, Overcast and Rainy), each representing values for
the attribute tested. Leaf node (e.g., Hours Played) represents a decision
on the numerical target. The topmost decision node in a tree which
corresponds to the best predictor called root node. Decision trees can
handle both categorical and numerical data.
50.
51. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
53. Support Vector Machines
● A Support Vector Machine (SVM) is a classifier that
tries to maximize the margin between training data
and the classification boundary (the plane defined by
𝑋𝛽 = 0)
54. Support Vector Machines
● The idea is that maximizing the margin maximizes
the chance that classification will be correct on
new data. We assume the new data of each class is
near the training data of that type.
55. SVM Training
SVMs can be trained using SGD. Recall that the Logistic
gradient was (this time assuming 𝒚𝒊 ∈ −𝟏, +𝟏 ):
𝑑𝐴
𝑑𝛽
=
𝑖=1
𝑁
𝑦𝑖𝑝𝑖 1 − 𝑝𝑖 𝑋𝑖
The SVM gradient can be defined as (here 𝑝𝑖 = 𝑋𝑖𝛽)
𝑑𝐴
𝑑𝛽
=
𝑖=1
𝑁
if 𝑝𝑖𝑦𝑖 < 1 then 𝑦𝑖𝑋𝑖 else 0
The expression 𝑝𝑖𝑦𝑖 < 1 tests whether the point 𝑋𝑖 is in
the margin, and if so adds it with sign 𝑦𝑖
. It ignores
other points.
Both methods weight points “near the middle” with
sign 𝒚𝒊
.
56. SVM Training
This SGD training method (called Pegasos) is much faster
and competitive with Logistic Regression.
Its also capable of training in less than one pass over a
dataset.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
58. Logistic Regression
● Logistic regression is probably the most widely used
general-purpose classifier.
● Its very scalable and can be very fast to train. It’s
used for
○ Spam filtering
○ News message classification
○ Web site classification
○ Product classification
○ Most classification problems with large, sparse
feature sets.
● The only caveat is that it can overfit on very sparse
data, so its often used with Regularization
59. Logistic Regression
● NOTE : Regression (predicting a real value) and
classification (predicting a discrete value).
● Logistic regression is designed as a binary classifier
(output say {0,1}) but actually outputs the probability
that the input instance is in the “1” class.
● A logistic classifier has the form:
𝑝 𝑋 =
1
1 + exp −𝑋𝛽
where 𝑋 = 𝑋1, … , 𝑋𝑛 is a vector of features.
60. Logistic Regression
● Logistic regression maps the “regression” value −𝑋𝛽
in
(-,) to the range [0,1] using a “logistic” function:
𝑝 𝑋 =
1
1 + exp −𝑋𝛽
● i.e. the logistic function maps any value on the real
line to a probability in the range [0,1]
61.
62. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
63. DAY 10
INTRODUCTION TO CLASSIFICATION AND DIFFERENT
TYPES OF CLASSIFICATION ALGORITHMS AND
MODEL SELECTION
64. INTRODUCTION TO CLASSIFICATION :
Classification may be defined as the process of predicting
class or category from observed values or given data points. The
categorized output can have the form such as “Black” or “White”
or “spam” or “no spam”.
Mathematically, classification is the task of
approximating a mapping function (f) from input variables (X) to
output variables (Y). It is basically belongs to the supervised
machine learning in which targets are also provided along with the
input data set.
65. TYPES OF LEARNERS IN CLASSIFICATION :
We have two types of learners in respective to classification
problems −
1. Lazy Learners
As the name suggests, such kind of learners waits for the testing data to be
appeared after storing the training data. Classification is done only after
getting the testing data. They spend less time on training but more time on
predicting. Examples of lazy learners are K-nearest neighbor and case-based
reasoning.
2. Eager Learners
As opposite to lazy learners, eager learners construct classification model
without waiting for the testing data to be appeared after storing the training
data. They spend more time on training but less time on predicting. Examples
of eager learners are Decision Trees, Naïve Bayes and Artificial Neural
Networks (ANN).
66.
67. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
69. •Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
•It is mainly used in text classification that includes a high-dimensional
training dataset.
•It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
•Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features. Such
as if the fruit is identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without
depending on each other.
•Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.
70. Bayes’ Theorem
P(A|B) = probability of A given that B is true.
P(A|B) =
In practice we are most interested in dealing with events e
and data D.
e = “I have a cold”
D = “runny nose,” “watery eyes,” “coughing”
P(e|D)=
So Bayes’ theorem is “diagnostic”.
P(B|A)P(A)
P(B)
P(D|e)P(e)
P(D)
72. Bayes’ Theorem
D = Data, e = some event
P(e|D) =
P(e) is called the prior probability of e. Its what we know (or
think we know) about e with no other evidence.
P(D|e) is the conditional probability of D given that e
happened, or just the likelihood of D. This can often be
measured or computed precisely – it follows from your
model assumptions.
P(e|D) is the posterior probability of e given D. It’s the
answer we want, or the way we choose a best answer.
You can see that the posterior is heavily colored by the prior,
so Bayes’ has a GIGO liability. e.g. its not used to test
hypotheses
P(D|e)P(e)
P(D)
73. Naïve Bayes Classifier
Let’s assume we have an instance (e.g. a document d) with
a set of features 𝑋1, … , 𝑋𝑘 and a set of classes 𝑐𝑗 to
which the document might belong.
We want to find the most likely class that the document
belongs to, given its features.
The joint probability of the class and features is:
Pr 𝑋1, … , 𝑋𝑘, 𝑐𝑗
74. Naïve Bayes Classifier
Key Assumption: (Naïve) the features are generated
independently given 𝑐𝑗. Then the joint probability factors:
Pr 𝑋, 𝑐𝑗 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗
𝑖=1
𝑘
Pr 𝑋𝑖|𝑐𝑗
We would like to figure out the most likely class for (i.e. to
classify) the document, which is the 𝑐𝑗 which maximizes:
Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘
75. Naïve Bayes Classifier
Now from Bayes we know that:
Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 / Pr 𝑋1, … , 𝑋𝑘
But to choose the best 𝑐𝑗, we can ignore Pr 𝑋1, … , 𝑋𝑘 since
it’s the same for every class. So we just have to maximize:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗
So finally we pick the category 𝑐𝑗 that maximizes:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗
𝑖=1
𝑘
Pr 𝑋𝑖|𝑐𝑗
76. Naïve Bayes Classifier
Now from Bayes we know that:
Pr 𝑐𝑗 | 𝑋1, … , 𝑋𝑘 = Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 / Pr 𝑋1, … , 𝑋𝑘
But to choose the best 𝑐𝑗, we can ignore Pr 𝑋1, … , 𝑋𝑘 since
it’s the same for every class. So we just have to maximize:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗
So finally we pick the category 𝑐𝑗 that maximizes:
Pr 𝑋1, … , 𝑋𝑘 | 𝑐𝑗 Pr 𝑐𝑗 = Pr 𝑐𝑗
𝑖=1
𝑘
Pr 𝑋𝑖|𝑐𝑗
A A A
B B B
77. Data for Naïve Bayes
In order to find the best class, we need two pieces of data:
• Pr 𝑐𝑗 the prior probability for the class 𝑐𝑗.
• Pr 𝑋𝑖|𝑐𝑗 the conditional probability of the feature 𝑋𝑖
given the class 𝑐𝑗.
78. Advantage and Disadvantage of NB Classifiers
● Simple and fast. Depend only on term frequency data
for the classes. One shot, no iteration.
● Very well-behaved numerically. Term weight
depends only on frequency of that term. Decoupled
from other terms.
● Can work very well with sparse data, where
combinations of dependent terms are rare.
● Subject to error and bias when term probabilities are
not independent (e.g. URL prefixes).
● Can’t model patterns in the data.
● Typically not as accurate as other methods.
79. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
81. •K-Nearest Neighbour is one of the simplest Machine Learning
algorithms based on Supervised Learning technique.
•K-NN algorithm assumes the similarity between the new
case/data and available cases and put the new case into the
category that is most similar to the available categories.
•K-NN algorithm stores all the available data and classifies a new
data point based on the similarity. This means when new data
appears then it can be easily classified into a well suite category
by using K- NN algorithm.
K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data.
•It is also called a lazy learner algorithm because it does not
learn from the training set immediately instead it stores the
dataset and at the time of classification, it performs an action on
the dataset.
82. •Example: Suppose, we have an image of a creature
that looks similar to cat and dog, but we want to know
either it is a cat or dog. So for this identification, we can
use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of
the new data set to the cats and dogs images and
based on the most similar features it will put it in either
cat or dog category.
83.
84. The K-NN working can be explained on the basis of the
below algorithm:
•Step-1: Select the number K of the neighbors
•Step-2: Calculate the Euclidean distance of K number
of neighbors
•Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
•Step-4: Among these k neighbors, count the number of
the data points in each category.
•Step-5: Assign the new data points to that category for
which the number of the neighbor is maximum.
•Step-6: Our model is ready.
85.
86. How to select the value of K in the K-NN Algorithm?
Below are some points to remember while selecting the value of K in the
K-NN algorithm:
•There is no particular way to determine the best value for "K", so we
need to try some values to find the best out of them. The most preferred
value for K is 5.
•A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
•Large values for K are good, but it may find some difficulties.
LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
88. •Decision Tree is a Supervised learning technique that can be
used for both classification and Regression problems, but mostly
it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the
features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
•It is a graphical representation for getting all the possible
solutions to a problem/decision based on given conditions.
89. Example: Suppose there is a candidate who has a job offer and
wants to decide whether he should accept the offer or Not. So, to
solve this problem, the decision tree starts with the root node
(Salary attribute by ASM). The root node splits further into the next
decision node (distance from the office) and one leaf node based
on the corresponding labels. The next decision node further gets
split into one decision node (Cab facility) and one leaf node. Finally,
the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:
90.
91. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
93. Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is
used for Classification as well as Regression
problems. However, primarily, it is used for
Classification problems in Machine Learning.
“The goal of the SVM algorithm is to create the best
line or decision boundary that can segregate n-
dimensional space into classes so that we can
easily put the new data point in the correct category
in the future. This best decision boundary is called a
hyperplane.”
94.
95. Example: SVM can be understood with the example that
we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we
want a model that can accurately identify whether it is a cat
or dog, so such a model can be created by using the SVM
algorithm. We will first train our model with lots of images of
cats and dogs so that it can learn about different features of
cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary
between these two data (cat and dog) and choose extreme
cases (support vectors), it will see the extreme case of cat
and dog. On the basis of the support vectors, it will classify
it as a cat. Consider the below diagram:
96.
97. Types of SVM
SVM can be of two types:
•Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
•Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
“SVM algorithm can be used for Face detection, image classification,
text categorization, etc”.
98. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
100. Clustering in Machine Learning :
Clustering or cluster analysis is a machine learning technique, which groups the
unlabelled dataset. It can be defined as "A way of grouping the data points into
different clusters, consisting of similar data points. The objects with the
possible similarities remain in a group that has less or no similarities with
another group."
101. What is K-Means Algorithm?
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in
the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and
so on.
It is an iterative algorithm that divides the unlabeled dataset into
k different clusters in such a way that each dataset belongs only one
group that has similar properties.
It is a centroid-based algorithm, where each cluster is associated
with a centroid. The main aim of this algorithm is to minimize the
sum of distances between the data point and their corresponding clusters.
102. The k-means clustering algorithm mainly performs two tasks:
•Determines the best value for K center points or centroids by an iterative process.
•Assigns each data point to its closest k-center. Those data points which are near
to the particular k-center, create a cluster.
103. How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
104. LINK FOR PRACTICE :
https://drive.google.com/drive/folders/1h4-
SVu9WLFfOWL8N4X2tZ_mEy9c5wIbw?usp=sharing
PRACTICE PRACTIEC PRACTICE !!!!
106. Hierarchical clustering is another unsupervised machine learning algorithm,
which is used to group the unlabeled datasets into a cluster and also known
as hierarchical cluster analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and
this tree-shaped structure is known as the dendrogram.
The hierarchical clustering technique has two approaches:
1.Agglomerative: Agglomerative is a bottom-up approach, in which the
algorithm starts with taking all data points as single clusters and merging them
until one cluster is left.
2.Divisive: Divisive algorithm is the reverse of the agglomerative algorithm as it
is a top-down approach.