SlideShare a Scribd company logo
1 of 51
www.SunilOS.com 1
Unsupervised Machine Learning
www.sunilos.com
www.raystec.com
Unsupervised Learning
❑ unsupervised machine learning is a type of machine
learning where we have only data points but no labels.
❑ We will make a group based on the similarity among data
points.
o For example, in real life we are arranging our bookshelves. In bookshelves we
have different kinds of books. We will make groups of books based on their
subjects. So, in unsupervised learning we iterate through data and group them
together based on similar characteristics.
❑ Unsupervised learning also known as clustering.
www.SunilOS.com 2
Types of clustering :
❑Partition Based
o partition-based clustering methods include K-Means, K-
Medoids, CLARANS, etc.
❑Hierarchical Based
o hierarchical clustering methods include BIRCH and
Chameleon.
❑Density Based Learning
o DBSCAN, OPTICS are the most popular density-based
clustering methods.
www.SunilOS.com 3
K-means clustering algorithm
❑ K-means is based on a partition based clustering method.
❑ K-means, it is one of the simplest unsupervised learning
algorithms that will solve the most well-known clustering
problem.
❑ The procedure can be grouped as the one which follows a
simple and very easy way to classify a given data set with
the help of a certain number of clusters (assume k clusters).
❑
www.SunilOS.com 4
How K-Means Clustering Works:
❑The K Means algorithm is iterative based, it repeatedly
calculates the cluster centroids, refining the values until
they do not change much.
❑The k-means algorithm takes a dataset of ‘n’ points as
input, together with an integer parameter ‘k’ specifying
how many clusters to create(supplied by the
programmer).
❑The output is a set of ‘k’ cluster centroids and a labeling
of the dataset that maps each of the data points to a
unique cluster.
❑
www.SunilOS.com 5
Steps of K-means clustering:
❑Choose the number of clusters k
❑Select k random points from the data as centroids
❑Assign all the points to the closest cluster centroid
❑Recompute the centroids of newly formed clusters
❑Repeat step 3 and 4
www.SunilOS.com 6
When to stop Iterating?
❑Centroids of newly formed clusters do not
change
❑Points remain in the same cluster
❑Maximum number of iterations are reached
www.SunilOS.com 7
Working of K-means
www.SunilOS.com 8
❑Sample Dataset:
Objects X Y Z
OB-1 1 4 1
OB-2 1 2 2
OB-3 1 4 2
OB-4 2 1 2
OB-5 1 1 1
OB-6 2 4 2
OB-7 1 1 2
OB-8 2 1 1
❑ We have total 8 data points. We will divide these points into 2
clusters. K=2 in k-means.
❑ Taking any two centroids or data points (as you took 2 as K
hence the number of centroids also 2) in its account initially.
❑ After choosing the centroids, (say C1 and C2) the data points
(coordinates here) are assigned to any of the Clusters
❑ Assume that the algorithm chose OB-2 (1,2,2) and OB-6 (2,4,2)
as centroids and cluster 1 and cluster 2 as well.
❑ For measuring the distances, you take the following distance
measurement function (also termed as similarity measurement
function):
❑ d=|x2–x1|+|y2–y1|+|z2–z1|
www.SunilOS.com 9
calculation of distances
Objects X Y Z
Distance from
C1(1,2,2)
Distance from
C2(2,4,2)
OB-1 1 4 1 3 2
OB-2 1 2 2 0 3
OB-3 1 4 2 2 1
OB-4 2 1 2 2 3
OB-5 1 1 1 2 5
OB-6 2 4 2 3 0
OB-7 1 1 2 1 4
OB-8 2 1 1 3 4
www.SunilOS.com 10
Cluster formation
❑ After the initial pass of clustering, the clustered objects will
look something like the following:
❑
www.SunilOS.com 11
Cluster 1
OB-2
OB-4
OB-5
OB-7
OB-8
Cluster 2
OB-1
OB-3
OB-6
❑
www.SunilOS.com 12
Distance from new Centroids
Objects X Y Z
Distance from
C1(1.4,1.2,1.6)
Distance from C2(1.33, 4,
1.66)
OB-1 1 4 1 3.8 1
OB-2 1 2 2 1.6 2.66
OB-3 1 4 2 3.6 0.66
OB-4 2 1 2 1.2 4
OB-5 1 1 1 1.2 4
OB-6 2 4 2 3.8 1
OB-7 1 1 2 1 3.66
OB-8 2 1 1 1.4 4.33
www.SunilOS.com 13
Updated Clusters
❑The new assignments of the objects with respect
to the updated clusters will be:
❑Algorithm will End here because no changes in
groups.
❑
www.SunilOS.com 14
Cluster 1
OB-2
OB-4
OB-5
OB-7
OB-8
Cluster 2
OB-1
OB-3
OB-6
Code Implementation of K-means
❑ import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
import numpy as np
X = np.array([[1, 2],
[1.5, 1.8],
[5, 8 ],
[8, 8],
[1, 0.6],
[9,11]])
plt.scatter(X[:,0], X[:,1], s=150)
plt.show()
www.SunilOS.com 15
Code Implementation of K-means (cont.)
❑ from sklearn.cluster import Kmeans
❑ # You want cluster the records into 2
kmeans = KMeans(n_clusters=2)
❑ #train Model
kmeans.fit(X)
❑ #test Model
labels = kmeans.predict([[20,8]])
print(labels)
centroids = kmeans.cluster_centers_
print(centroids)
www.SunilOS.com 16
www.SunilOS.com 17
Reinforcement Learning
www.sunilos.com
www.raystec.com
Reinforcement Learning
❑We first learn by interacting with the environment.
❑Whether we are learning to drive a car or learning to walk,
the learning is based on the interaction with the
environment.
❑Learning from interaction is the foundational underlying
concept for all theories of learning and intelligence.
❑Reinforcement Learning – a goal-oriented learning based on
interaction with the environment. Reinforcement Learning
is said to be the hope of true artificial intelligence.
www.SunilOS.com 18
Problem Statement
❑How a child learn to walk?
www.SunilOS.com 19
Formalized the Problem?
❑ The child is an agent trying to manipulate the environment
(which is the surface on which it walks) by taking actions
(walking) and he/she tries to go from one state (each step
he/she takes) to another.
❑ The child gets a reward (let’s say chocolate) when he/she
accomplishes a sub module of the task (taking a couple of
steps) and will not receive any chocolate (negative reward)
when he/she is not able to walk.
❑ This is a simplified description of a reinforcement learning
problem.
www.SunilOS.com 20
Basis of Reinforcement Learning
www.SunilOS.com 21
Difference Between Different Kind of Machine Learning:
Supervised Unsupervised Reinforcement
Definition Learns by labeled data Learns by unlabelled
data
Learns by interacting
with environment by
actions and discovers
errors and rewards
Types of Problems Regression and
classification
Association and
clustering
Reward based
Data Labeled Unlabeled No predefined data
Training External supervision No supervision No supervision
Approach Map labeled input to
known output
Search patterns and
discover output
Follow trail and error
method
Algorithms SVM, KNN, Linear
Regression,
K-means, C-means Q-Learning, SARSA
etc.
www.SunilOS.com 22
Terminology of reinforcement Learning:
❑ Agent: An entity (computer program) that learns from the environment
based on the feedback.
❑ Action: Actions are steps taken by agent according to the
situation(Environment).
❑ Environment: The surrounding in which agent is present to act.
Environment is always random in nature.
❑ State: state is returned by the environment after each act of the agent
❑ Reward: It is a feedback which can be positive or negative based on
the action of the agent.
❑ Policy: This is an approach applied by agents for the next step based
on the current situation.
❑ Value: It is long term result opposite to the short term reward
❑ Q-value: same as value but with additional parameter as a current
action.
www.SunilOS.com 23
Key Points of Reinforcement Learning:
❑It is Based on try and error method
❑In this Learning agent is not guided about the
environment, and which next step to be taken.
❑Agent takes the next action based on the previous
feedback.
❑Agents will also get the delayed penalty.
❑The environment for the agent to interact is always a
random one, and the agent has to reach the destination
and get the maximum reward points.
www.SunilOS.com 24
How to implement RL in Machine Learning:
❑There are three approaches to implement RL
❑Model based learning
o In this approach a prototype is created for the environment and agents will
explore this model. For each situation a different model is created.
❑Policy based Learning
o This approach is based on finding the optimal strategy to get the maximum
future points without relying on any value function. There can be two
types of policy:
❑Value based learning
o In this approach agents try to get maximum value at any state under any
policy.
❑
www.SunilOS.com 25
When Not to Use RL?
❑Enough Data for training the model
❑It is a time consuming process
www.SunilOS.com 26
Why use Reinforcement Learning?
❑For a reward based system to learn.
❑When agents want to learn from the action.
❑Helps you to discover which action yields the
highest reward over the longer period.
❑When we want to find best method for obtaining
large rewards.
www.SunilOS.com 27
Learning Models of Reinforcement
❑Markov Decision Process
❑Q learning
❑SARSA (State Action Reward State Action)
❑Deep Q Neural Network (DQN)
www.SunilOS.com 28
Q-Learning
❑In Q learning , Q stands for quality.
❑It is a value based learning.
❑In this approach a value is given to the agent to inform
which action is best to take.
❑To perform any action, the agent will get a reward R(s,
a), and also he will end up on a certain state, so the Q -
value equation will be:
www.SunilOS.com 29
Q-Learning process
www.SunilOS.com 30
Application of RL
www.SunilOS.com 31
Gym environment for Reinforcement Learning:
❑Gym is the python library for developing
reinforcement learning algorithms:
❑We can install gym using following command:
❑pip install gym
www.SunilOS.com 32
env object contains the following main functions:
❑ The step() function takes an action object as an argument and
returns four objects:
❑ observation: An object implemented by the environment,
representing the observation of the environment.
❑ reward: A signed float value indicating the gain (or loss) from
the previous action.
❑ done: A Boolean value representing if the scenario is finished.
❑ The render() function creates a visual representation of the
environment.
❑ The reset() function resets the environment to the original state.
❑
www.SunilOS.com 33
Implementation
❑ Most Popular game is cart pole.
❑ In this game a pole is attached with a cart and we have to balance
it.
❑ If the pole tilts more than 15 degree or the cart moves more than
2.4 meter from center the pole will fall.
❑ This is the very simplest environment to learn the basics.
❑ The game has only four observations and two actions.
o The actions are to move a cart by applying a force of +1 or -
1.
o The observations are the position of the cart, the velocity of
the cart, the angle of the pole, and the rotation rate of the pole.
www.SunilOS.com 34
Getting environment Of Cartpole
❑ import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset()
for t in range(1000):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info =
env.step(action)
if done:
print("Episode finished after {}
timesteps".format(t+1))
break
env.close()
www.SunilOS.com 35
www.SunilOS.com 36
Data Preprocessing
www.sunilos.com
www.raystec.com
Why data Preprocessing
❑Data in real world is not perfect for learning.
❑It is noisy, dirty and incomplete.
❑No quality data no quality results.
www.SunilOS.com 37
Processing
Types of Data:
www.SunilOS.com 38
Types of Data(cont.)
❑ Nominal Data: categorical values without any order. For ex. Color of
cars: black, white, red, blue.
❑ Ordinal Data: Categorical Data with a natural order. For ex. Size of
clothes: small, medium, large, extra large. But the scale of difference is
not allowed. For example large-medium=small
❑ Interval Data: Numeric values with defined unit of measurement. For ex.
Temperature, dates.
❑ Ratio: numeric variables with a defined unit of measurement but both
differences and ratio is meaningful count, age , mass length.
❑ Time Series data: A time series is a series of data points indexed in time
order. Most commonly, a time series is a sequence taken at successive
equally spaced points in time. Ex. weather forecasting.
❑ Text Data: This is unstructured data. Text data usually consists of
documents which can represent words, sentences or even paragraphs of
free flowing text.
www.SunilOS.com 39
Data Processing Steps:
❑Dataset is viewed as a collection of data objects.
❑Data objects contain many features.
❑Features means characteristics of a data object. For
example color, speed, mileage of a car.
❑These are the basic steps in data processing
o Data Quality Assessment
o Feature Aggregation
o Feature Sampling
o Dimensionality Reduction
o Feature Encoding
www.SunilOS.com 40
Data Quality assessment:
❑Collected Data may be incomplete and noisy.
❑We cannot completely rely on data acquiring tools.
❑There may be flaws in the data collection process.
❑Raw data contains missing values, duplicate values, and
inconsistent values.
❑We have to tackle all these limitations before going for
machine learning.
www.SunilOS.com 41
Feature aggregation
❑After Collecting data from different sources.
❑Now aggregate data to single unit.
❑Reduce memory consumption.
❑For example we are collecting daily sales records of a
store from multiple places. We can aggregate these data
into monthly sales or yearly sales.
www.SunilOS.com 42
Feature Sampling:
❑ Large Dataset from different sources.
❑ Take a subset from it for machine learning model.
❑ Choose a sampling algorithm which properly divide the dataset
into working subset of data.
❑ Take care of imbalanced dataset classes.
❑ Some sampling algorithms:
o Simple random sampling.
o Systematic sampling.
o Stratified sampling.
o Clustered sampling.
o Convenience sampling.
o Quota sampling.
o Judgement (or Purposive) Sampling..
o Snowball sampling.
www.SunilOS.com 43
Dimensionality Reduction:
❑ Datasets are represented in Higher dimensions (3D graphs).
❑ We can not easily visualize the data in higher dimensions.
❑ Reduce the dimensions of datasets.
❑ Map Higher dimensions space (n dimensions) to the lower
dimensional space (2D plots).
❑ Lower dimension space is easy to process and visualize.
www.SunilOS.com 44
Feature Encoding:
❑ Machines cannot understand the data as humans.
❑ We have to convert the dataset into machine readable form.
❑ Feature encoding techniques are different for different kinds of
data.
www.SunilOS.com 45
Data Pre Processing Libraries
❑ # used for handling numbers
❑ import numpy as np
❑ # used for handling the dataset
❑ import pandas as pd
❑ # used for handling missing data
❑ from sklearn.impute import SimpleImputer
❑ # used for encoding categorical data
❑ from sklearn.preprocessing import LabelEncoder,
OneHotEncoder
❑ # used for splitting training and testing data
❑ from sklearn.model_selection import train_test_split
❑ # used for feature scaling
❑ from sklearn.preprocessing import StandardScaler
www.SunilOS.com 46
Label Encoder for the Categorical data:
❑ # Categorical Feature
❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra
iny','Rainy','Overcast','Sunny','Sunny','Rainy'
,'Sunny','Overcast','Overcast','Rainy']
❑ # Import LabelEncoder
❑ from sklearn import preprocessing
❑ #creating labelEncoder
❑ le = preprocessing.LabelEncoder()
❑ # Converting string labels into numbers.
❑ weather_encoded=le.fit_transform(weather)
❑ print(weather_encoded)
www.SunilOS.com 47
Dealing with Missing value
❑ import pandas as pd
❑ import numpy as np
❑ df=pd.DataFrame({"Age":[23,70,56,24,np.nan],
"Salary":[30000,30000,50000,np.nan,40000]})
❑ print(df)
❑ from sklearn.impute import SimpleImputer
❑ imp = SimpleImputer(missing_values=np.nan,
❑ strategy="most_frequent")
❑ X = imp.fit_transform(df)
❑ df1=pd.DataFrame(X, columns=["Age","Salary"])
❑ print(df1)
www.SunilOS.com 48
Scaling Data
❑ from sklearn.preprocessing import
❑ StandardScaler
❑ sc = StandardScaler(with_mean=True)
❑ X = sc.fit_transform(df1)
❑ X_scaled=pd.DataFrame(X, columns=["Age","Salary
❑ ])
❑ print(X_scaled)
www.SunilOS.com 49
Disclaimer
❑This is an educational presentation to enhance the
skill of computer science students.
❑This presentation is available for free to computer
science students.
❑Some internet images from different URLs are used
in this presentation to simplify technical examples
and correlate examples with the real world.
❑We are grateful to owners of these URLs and
pictures.
www.SunilOS.com 50
Thank You!
www.SunilOS.com 51
www.SunilOS.com

More Related Content

What's hot

Threads V4
Threads  V4Threads  V4
Threads V4Sunil OS
 
OOP V3.1
OOP V3.1OOP V3.1
OOP V3.1Sunil OS
 
JDBC
JDBCJDBC
JDBCSunil OS
 
Python Part 1
Python Part 1Python Part 1
Python Part 1Sunil OS
 
Collections Framework
Collections FrameworkCollections Framework
Collections FrameworkSunil OS
 
Resource Bundle
Resource BundleResource Bundle
Resource BundleSunil OS
 
Java Basics
Java BasicsJava Basics
Java BasicsSunil OS
 
DJango
DJangoDJango
DJangoSunil OS
 
Jsp/Servlet
Jsp/ServletJsp/Servlet
Jsp/ServletSunil OS
 
JavaScript
JavaScriptJavaScript
JavaScriptSunil OS
 
Java Basics V3
Java Basics V3Java Basics V3
Java Basics V3Sunil OS
 
C Basics
C BasicsC Basics
C BasicsSunil OS
 
JAVA Variables and Operators
JAVA Variables and OperatorsJAVA Variables and Operators
JAVA Variables and OperatorsSunil OS
 
JAVA OOP
JAVA OOPJAVA OOP
JAVA OOPSunil OS
 
Java Threads and Concurrency
Java Threads and ConcurrencyJava Threads and Concurrency
Java Threads and ConcurrencySunil OS
 
Hibernate
Hibernate Hibernate
Hibernate Sunil OS
 
Java Input Output and File Handling
Java Input Output and File HandlingJava Input Output and File Handling
Java Input Output and File HandlingSunil OS
 
C++ oop
C++ oopC++ oop
C++ oopSunil OS
 
Java arrays
Java arraysJava arrays
Java arraysJin Castor
 

What's hot (20)

Threads V4
Threads  V4Threads  V4
Threads V4
 
OOP V3.1
OOP V3.1OOP V3.1
OOP V3.1
 
JDBC
JDBCJDBC
JDBC
 
Python Part 1
Python Part 1Python Part 1
Python Part 1
 
Collections Framework
Collections FrameworkCollections Framework
Collections Framework
 
Resource Bundle
Resource BundleResource Bundle
Resource Bundle
 
Java Basics
Java BasicsJava Basics
Java Basics
 
DJango
DJangoDJango
DJango
 
Jsp/Servlet
Jsp/ServletJsp/Servlet
Jsp/Servlet
 
JavaScript
JavaScriptJavaScript
JavaScript
 
Java Basics V3
Java Basics V3Java Basics V3
Java Basics V3
 
C Basics
C BasicsC Basics
C Basics
 
JAVA Variables and Operators
JAVA Variables and OperatorsJAVA Variables and Operators
JAVA Variables and Operators
 
JAVA OOP
JAVA OOPJAVA OOP
JAVA OOP
 
Java Threads and Concurrency
Java Threads and ConcurrencyJava Threads and Concurrency
Java Threads and Concurrency
 
Hibernate
Hibernate Hibernate
Hibernate
 
Java Input Output and File Handling
Java Input Output and File HandlingJava Input Output and File Handling
Java Input Output and File Handling
 
C++ oop
C++ oopC++ oop
C++ oop
 
Java arrays
Java arraysJava arrays
Java arrays
 
Java practical
Java practicalJava practical
Java practical
 

Similar to Machine learning ( Part 3 )

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptxCYPatrickKwee
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxSureshPolisetty2
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAminaRepo
 
Detection of Online Learning Activity Scopes
Detection of Online Learning Activity ScopesDetection of Online Learning Activity Scopes
Detection of Online Learning Activity ScopesSyeda Sana
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxPlacementsBCA
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...Edureka!
 

Similar to Machine learning ( Part 3 ) (20)

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptx
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
For iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptxFor iiii year students of cse ML-UNIT-V.pptx
For iiii year students of cse ML-UNIT-V.pptx
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement Learning
 
Detection of Online Learning Activity Scopes
Detection of Online Learning Activity ScopesDetection of Online Learning Activity Scopes
Detection of Online Learning Activity Scopes
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
3 classification
3  classification3  classification
3 classification
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginners
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
 

More from Sunil OS

OOP v3
OOP v3OOP v3
OOP v3Sunil OS
 
Threads v3
Threads v3Threads v3
Threads v3Sunil OS
 
Exception Handling v3
Exception Handling v3Exception Handling v3
Exception Handling v3Sunil OS
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
Angular 8
Angular 8 Angular 8
Angular 8 Sunil OS
 
C# Variables and Operators
C# Variables and OperatorsC# Variables and Operators
C# Variables and OperatorsSunil OS
 
C# Basics
C# BasicsC# Basics
C# BasicsSunil OS
 
Rays Technologies
Rays TechnologiesRays Technologies
Rays TechnologiesSunil OS
 
Log4 J
Log4 JLog4 J
Log4 JSunil OS
 
JUnit 4
JUnit 4JUnit 4
JUnit 4Sunil OS
 
Java Swing JFC
Java Swing JFCJava Swing JFC
Java Swing JFCSunil OS
 

More from Sunil OS (12)

OOP v3
OOP v3OOP v3
OOP v3
 
Threads v3
Threads v3Threads v3
Threads v3
 
Exception Handling v3
Exception Handling v3Exception Handling v3
Exception Handling v3
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Angular 8
Angular 8 Angular 8
Angular 8
 
C# Variables and Operators
C# Variables and OperatorsC# Variables and Operators
C# Variables and Operators
 
C# Basics
C# BasicsC# Basics
C# Basics
 
Rays Technologies
Rays TechnologiesRays Technologies
Rays Technologies
 
C++
C++C++
C++
 
Log4 J
Log4 JLog4 J
Log4 J
 
JUnit 4
JUnit 4JUnit 4
JUnit 4
 
Java Swing JFC
Java Swing JFCJava Swing JFC
Java Swing JFC
 

Recently uploaded

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxDavid Douglas School District
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Recently uploaded (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Machine learning ( Part 3 )

  • 1. www.SunilOS.com 1 Unsupervised Machine Learning www.sunilos.com www.raystec.com
  • 2. Unsupervised Learning ❑ unsupervised machine learning is a type of machine learning where we have only data points but no labels. ❑ We will make a group based on the similarity among data points. o For example, in real life we are arranging our bookshelves. In bookshelves we have different kinds of books. We will make groups of books based on their subjects. So, in unsupervised learning we iterate through data and group them together based on similar characteristics. ❑ Unsupervised learning also known as clustering. www.SunilOS.com 2
  • 3. Types of clustering : ❑Partition Based o partition-based clustering methods include K-Means, K- Medoids, CLARANS, etc. ❑Hierarchical Based o hierarchical clustering methods include BIRCH and Chameleon. ❑Density Based Learning o DBSCAN, OPTICS are the most popular density-based clustering methods. www.SunilOS.com 3
  • 4. K-means clustering algorithm ❑ K-means is based on a partition based clustering method. ❑ K-means, it is one of the simplest unsupervised learning algorithms that will solve the most well-known clustering problem. ❑ The procedure can be grouped as the one which follows a simple and very easy way to classify a given data set with the help of a certain number of clusters (assume k clusters). ❑ www.SunilOS.com 4
  • 5. How K-Means Clustering Works: ❑The K Means algorithm is iterative based, it repeatedly calculates the cluster centroids, refining the values until they do not change much. ❑The k-means algorithm takes a dataset of ‘n’ points as input, together with an integer parameter ‘k’ specifying how many clusters to create(supplied by the programmer). ❑The output is a set of ‘k’ cluster centroids and a labeling of the dataset that maps each of the data points to a unique cluster. ❑ www.SunilOS.com 5
  • 6. Steps of K-means clustering: ❑Choose the number of clusters k ❑Select k random points from the data as centroids ❑Assign all the points to the closest cluster centroid ❑Recompute the centroids of newly formed clusters ❑Repeat step 3 and 4 www.SunilOS.com 6
  • 7. When to stop Iterating? ❑Centroids of newly formed clusters do not change ❑Points remain in the same cluster ❑Maximum number of iterations are reached www.SunilOS.com 7
  • 8. Working of K-means www.SunilOS.com 8 ❑Sample Dataset: Objects X Y Z OB-1 1 4 1 OB-2 1 2 2 OB-3 1 4 2 OB-4 2 1 2 OB-5 1 1 1 OB-6 2 4 2 OB-7 1 1 2 OB-8 2 1 1
  • 9. ❑ We have total 8 data points. We will divide these points into 2 clusters. K=2 in k-means. ❑ Taking any two centroids or data points (as you took 2 as K hence the number of centroids also 2) in its account initially. ❑ After choosing the centroids, (say C1 and C2) the data points (coordinates here) are assigned to any of the Clusters ❑ Assume that the algorithm chose OB-2 (1,2,2) and OB-6 (2,4,2) as centroids and cluster 1 and cluster 2 as well. ❑ For measuring the distances, you take the following distance measurement function (also termed as similarity measurement function): ❑ d=|x2–x1|+|y2–y1|+|z2–z1| www.SunilOS.com 9
  • 10. calculation of distances Objects X Y Z Distance from C1(1,2,2) Distance from C2(2,4,2) OB-1 1 4 1 3 2 OB-2 1 2 2 0 3 OB-3 1 4 2 2 1 OB-4 2 1 2 2 3 OB-5 1 1 1 2 5 OB-6 2 4 2 3 0 OB-7 1 1 2 1 4 OB-8 2 1 1 3 4 www.SunilOS.com 10
  • 11. Cluster formation ❑ After the initial pass of clustering, the clustered objects will look something like the following: ❑ www.SunilOS.com 11 Cluster 1 OB-2 OB-4 OB-5 OB-7 OB-8 Cluster 2 OB-1 OB-3 OB-6
  • 13. Distance from new Centroids Objects X Y Z Distance from C1(1.4,1.2,1.6) Distance from C2(1.33, 4, 1.66) OB-1 1 4 1 3.8 1 OB-2 1 2 2 1.6 2.66 OB-3 1 4 2 3.6 0.66 OB-4 2 1 2 1.2 4 OB-5 1 1 1 1.2 4 OB-6 2 4 2 3.8 1 OB-7 1 1 2 1 3.66 OB-8 2 1 1 1.4 4.33 www.SunilOS.com 13
  • 14. Updated Clusters ❑The new assignments of the objects with respect to the updated clusters will be: ❑Algorithm will End here because no changes in groups. ❑ www.SunilOS.com 14 Cluster 1 OB-2 OB-4 OB-5 OB-7 OB-8 Cluster 2 OB-1 OB-3 OB-6
  • 15. Code Implementation of K-means ❑ import matplotlib.pyplot as plt from matplotlib import style style.use('ggplot') import numpy as np X = np.array([[1, 2], [1.5, 1.8], [5, 8 ], [8, 8], [1, 0.6], [9,11]]) plt.scatter(X[:,0], X[:,1], s=150) plt.show() www.SunilOS.com 15
  • 16. Code Implementation of K-means (cont.) ❑ from sklearn.cluster import Kmeans ❑ # You want cluster the records into 2 kmeans = KMeans(n_clusters=2) ❑ #train Model kmeans.fit(X) ❑ #test Model labels = kmeans.predict([[20,8]]) print(labels) centroids = kmeans.cluster_centers_ print(centroids) www.SunilOS.com 16
  • 18. Reinforcement Learning ❑We first learn by interacting with the environment. ❑Whether we are learning to drive a car or learning to walk, the learning is based on the interaction with the environment. ❑Learning from interaction is the foundational underlying concept for all theories of learning and intelligence. ❑Reinforcement Learning – a goal-oriented learning based on interaction with the environment. Reinforcement Learning is said to be the hope of true artificial intelligence. www.SunilOS.com 18
  • 19. Problem Statement ❑How a child learn to walk? www.SunilOS.com 19
  • 20. Formalized the Problem? ❑ The child is an agent trying to manipulate the environment (which is the surface on which it walks) by taking actions (walking) and he/she tries to go from one state (each step he/she takes) to another. ❑ The child gets a reward (let’s say chocolate) when he/she accomplishes a sub module of the task (taking a couple of steps) and will not receive any chocolate (negative reward) when he/she is not able to walk. ❑ This is a simplified description of a reinforcement learning problem. www.SunilOS.com 20
  • 21. Basis of Reinforcement Learning www.SunilOS.com 21
  • 22. Difference Between Different Kind of Machine Learning: Supervised Unsupervised Reinforcement Definition Learns by labeled data Learns by unlabelled data Learns by interacting with environment by actions and discovers errors and rewards Types of Problems Regression and classification Association and clustering Reward based Data Labeled Unlabeled No predefined data Training External supervision No supervision No supervision Approach Map labeled input to known output Search patterns and discover output Follow trail and error method Algorithms SVM, KNN, Linear Regression, K-means, C-means Q-Learning, SARSA etc. www.SunilOS.com 22
  • 23. Terminology of reinforcement Learning: ❑ Agent: An entity (computer program) that learns from the environment based on the feedback. ❑ Action: Actions are steps taken by agent according to the situation(Environment). ❑ Environment: The surrounding in which agent is present to act. Environment is always random in nature. ❑ State: state is returned by the environment after each act of the agent ❑ Reward: It is a feedback which can be positive or negative based on the action of the agent. ❑ Policy: This is an approach applied by agents for the next step based on the current situation. ❑ Value: It is long term result opposite to the short term reward ❑ Q-value: same as value but with additional parameter as a current action. www.SunilOS.com 23
  • 24. Key Points of Reinforcement Learning: ❑It is Based on try and error method ❑In this Learning agent is not guided about the environment, and which next step to be taken. ❑Agent takes the next action based on the previous feedback. ❑Agents will also get the delayed penalty. ❑The environment for the agent to interact is always a random one, and the agent has to reach the destination and get the maximum reward points. www.SunilOS.com 24
  • 25. How to implement RL in Machine Learning: ❑There are three approaches to implement RL ❑Model based learning o In this approach a prototype is created for the environment and agents will explore this model. For each situation a different model is created. ❑Policy based Learning o This approach is based on finding the optimal strategy to get the maximum future points without relying on any value function. There can be two types of policy: ❑Value based learning o In this approach agents try to get maximum value at any state under any policy. ❑ www.SunilOS.com 25
  • 26. When Not to Use RL? ❑Enough Data for training the model ❑It is a time consuming process www.SunilOS.com 26
  • 27. Why use Reinforcement Learning? ❑For a reward based system to learn. ❑When agents want to learn from the action. ❑Helps you to discover which action yields the highest reward over the longer period. ❑When we want to find best method for obtaining large rewards. www.SunilOS.com 27
  • 28. Learning Models of Reinforcement ❑Markov Decision Process ❑Q learning ❑SARSA (State Action Reward State Action) ❑Deep Q Neural Network (DQN) www.SunilOS.com 28
  • 29. Q-Learning ❑In Q learning , Q stands for quality. ❑It is a value based learning. ❑In this approach a value is given to the agent to inform which action is best to take. ❑To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state, so the Q - value equation will be: www.SunilOS.com 29
  • 32. Gym environment for Reinforcement Learning: ❑Gym is the python library for developing reinforcement learning algorithms: ❑We can install gym using following command: ❑pip install gym www.SunilOS.com 32
  • 33. env object contains the following main functions: ❑ The step() function takes an action object as an argument and returns four objects: ❑ observation: An object implemented by the environment, representing the observation of the environment. ❑ reward: A signed float value indicating the gain (or loss) from the previous action. ❑ done: A Boolean value representing if the scenario is finished. ❑ The render() function creates a visual representation of the environment. ❑ The reset() function resets the environment to the original state. ❑ www.SunilOS.com 33
  • 34. Implementation ❑ Most Popular game is cart pole. ❑ In this game a pole is attached with a cart and we have to balance it. ❑ If the pole tilts more than 15 degree or the cart moves more than 2.4 meter from center the pole will fall. ❑ This is the very simplest environment to learn the basics. ❑ The game has only four observations and two actions. o The actions are to move a cart by applying a force of +1 or - 1. o The observations are the position of the cart, the velocity of the cart, the angle of the pole, and the rotation rate of the pole. www.SunilOS.com 34
  • 35. Getting environment Of Cartpole ❑ import gym env = gym.make('CartPole-v0') for i_episode in range(20): observation = env.reset() for t in range(1000): env.render() print(observation) action = env.action_space.sample() observation, reward, done, info = env.step(action) if done: print("Episode finished after {} timesteps".format(t+1)) break env.close() www.SunilOS.com 35
  • 37. Why data Preprocessing ❑Data in real world is not perfect for learning. ❑It is noisy, dirty and incomplete. ❑No quality data no quality results. www.SunilOS.com 37 Processing
  • 39. Types of Data(cont.) ❑ Nominal Data: categorical values without any order. For ex. Color of cars: black, white, red, blue. ❑ Ordinal Data: Categorical Data with a natural order. For ex. Size of clothes: small, medium, large, extra large. But the scale of difference is not allowed. For example large-medium=small ❑ Interval Data: Numeric values with defined unit of measurement. For ex. Temperature, dates. ❑ Ratio: numeric variables with a defined unit of measurement but both differences and ratio is meaningful count, age , mass length. ❑ Time Series data: A time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Ex. weather forecasting. ❑ Text Data: This is unstructured data. Text data usually consists of documents which can represent words, sentences or even paragraphs of free flowing text. www.SunilOS.com 39
  • 40. Data Processing Steps: ❑Dataset is viewed as a collection of data objects. ❑Data objects contain many features. ❑Features means characteristics of a data object. For example color, speed, mileage of a car. ❑These are the basic steps in data processing o Data Quality Assessment o Feature Aggregation o Feature Sampling o Dimensionality Reduction o Feature Encoding www.SunilOS.com 40
  • 41. Data Quality assessment: ❑Collected Data may be incomplete and noisy. ❑We cannot completely rely on data acquiring tools. ❑There may be flaws in the data collection process. ❑Raw data contains missing values, duplicate values, and inconsistent values. ❑We have to tackle all these limitations before going for machine learning. www.SunilOS.com 41
  • 42. Feature aggregation ❑After Collecting data from different sources. ❑Now aggregate data to single unit. ❑Reduce memory consumption. ❑For example we are collecting daily sales records of a store from multiple places. We can aggregate these data into monthly sales or yearly sales. www.SunilOS.com 42
  • 43. Feature Sampling: ❑ Large Dataset from different sources. ❑ Take a subset from it for machine learning model. ❑ Choose a sampling algorithm which properly divide the dataset into working subset of data. ❑ Take care of imbalanced dataset classes. ❑ Some sampling algorithms: o Simple random sampling. o Systematic sampling. o Stratified sampling. o Clustered sampling. o Convenience sampling. o Quota sampling. o Judgement (or Purposive) Sampling.. o Snowball sampling. www.SunilOS.com 43
  • 44. Dimensionality Reduction: ❑ Datasets are represented in Higher dimensions (3D graphs). ❑ We can not easily visualize the data in higher dimensions. ❑ Reduce the dimensions of datasets. ❑ Map Higher dimensions space (n dimensions) to the lower dimensional space (2D plots). ❑ Lower dimension space is easy to process and visualize. www.SunilOS.com 44
  • 45. Feature Encoding: ❑ Machines cannot understand the data as humans. ❑ We have to convert the dataset into machine readable form. ❑ Feature encoding techniques are different for different kinds of data. www.SunilOS.com 45
  • 46. Data Pre Processing Libraries ❑ # used for handling numbers ❑ import numpy as np ❑ # used for handling the dataset ❑ import pandas as pd ❑ # used for handling missing data ❑ from sklearn.impute import SimpleImputer ❑ # used for encoding categorical data ❑ from sklearn.preprocessing import LabelEncoder, OneHotEncoder ❑ # used for splitting training and testing data ❑ from sklearn.model_selection import train_test_split ❑ # used for feature scaling ❑ from sklearn.preprocessing import StandardScaler www.SunilOS.com 46
  • 47. Label Encoder for the Categorical data: ❑ # Categorical Feature ❑ weather=['Sunny','Sunny','Overcast','Rainy','Ra iny','Rainy','Overcast','Sunny','Sunny','Rainy' ,'Sunny','Overcast','Overcast','Rainy'] ❑ # Import LabelEncoder ❑ from sklearn import preprocessing ❑ #creating labelEncoder ❑ le = preprocessing.LabelEncoder() ❑ # Converting string labels into numbers. ❑ weather_encoded=le.fit_transform(weather) ❑ print(weather_encoded) www.SunilOS.com 47
  • 48. Dealing with Missing value ❑ import pandas as pd ❑ import numpy as np ❑ df=pd.DataFrame({"Age":[23,70,56,24,np.nan], "Salary":[30000,30000,50000,np.nan,40000]}) ❑ print(df) ❑ from sklearn.impute import SimpleImputer ❑ imp = SimpleImputer(missing_values=np.nan, ❑ strategy="most_frequent") ❑ X = imp.fit_transform(df) ❑ df1=pd.DataFrame(X, columns=["Age","Salary"]) ❑ print(df1) www.SunilOS.com 48
  • 49. Scaling Data ❑ from sklearn.preprocessing import ❑ StandardScaler ❑ sc = StandardScaler(with_mean=True) ❑ X = sc.fit_transform(df1) ❑ X_scaled=pd.DataFrame(X, columns=["Age","Salary ❑ ]) ❑ print(X_scaled) www.SunilOS.com 49
  • 50. Disclaimer ❑This is an educational presentation to enhance the skill of computer science students. ❑This presentation is available for free to computer science students. ❑Some internet images from different URLs are used in this presentation to simplify technical examples and correlate examples with the real world. ❑We are grateful to owners of these URLs and pictures. www.SunilOS.com 50