SlideShare una empresa de Scribd logo
1 de 4
Descargar para leer sin conexión
Chapter – 05 Instance based Classification: kNN
Instance based method is one of the non-parametric classification technique. We make numerous
decisions each and every day and more often than not they are almost always based on the particular
problem and observations we are presented with. Even though we might have an aggregate opinion,
that particular decision is based on the data we observe in the immediate without challenging long held
beliefs. The immediate percept dominates the decision making process. Thus the instance based
process is non parametric because we do not make any assumption about the data, its distribution and
hence there are no population parameters that we try to infer using any sample statistic. We do not
tend to leverage a model, instead we draw upon past experience under similar situations and try to
repeat what we had done before under similar situations. In that sense that model of learning, Jiawei
Han et. al. Characterize as lazy learners in contrast to other models we have studied, where in the
learner constructs a model and new scenarios are evaluated with that model, and therefore called
eager learners. All parametric models are by design eager learners. But not all instance based (non-
parametric) are lazy learners. We will study few eager non-parametric learners later in our journey.
Nearest Neighbor technique is one such non-parametric lazy learner technique. The intuition behind NN
is best illustarted using simple heuristics. Imagine yourself in the Yosemite valley. You hear people
having animated conversation. You generally avoid trouble and are you likely to assume that those
people you heard were trouble makers or other passionate nature lovers, like yourself? Most people
would disregard the acrimonious debate and assume them to be fellow hikers. We have heard the old
adage “birds of a feather flock together” or “tell me who your friends are and I will tell you who you
are.”. Our brain can effortlessly separate like things from unlike things.
Let us take a trip to Africa (photographs from internet)
Animals of a kind, hang out together. This is a natural order. Assume we are on a trek on that distant
hill. Someone in the group asks hey what is that animal over there? Look at the above pics from the
internet, I have blued out a few, can you guess what that animal might be? If you tell me 2 or 3 animals
around that animal, I can guess what animal that is? This is the essential idea behind nearest neighbor
algorithms, designated kNN. Even a grade school human brain will score high on this test. Why is it
innate? How does the brain know what is similar and what is not? How does it settle on a particular k
before making a determination? The k specifies the number of neighbors considered by the algorithm to
assign a class to the unknown observation. How does the brain perform this task? Algorithms implement
using standard similarity algorithms. Please refer to any basic introduction to Similarity and I would
recommend https://aiaspirant.com/distance-similarity-measures-in-machine-learning/ for an easy
reading.
How does kNN work?
kNN computes the distance between the given instance and every other instance seen in the past, (aka
dataset). kNN then finds the k nearest neighbors, using that distance metric and assigns the majority
class (most frequent class of the k nearest neighbors) to the given instance. k is usually 3,5 or some odd
number to avoid tie. Thus, by definition there is no training/learning or induction phase in kNN. Given
any new instance, kNN iterates over all known instance to compute the distance, finds the k nearest
neighbors and assigns the majority class to the new instance. And the keyword here is distance. What is
distance between giraffe and elephant. Without any format education how does the human brain
compute this distance. All mammalian brains appear to compute if we judge how they tend to remain
in groups(aggregation) or pick out a target (discrimination). Our challenge, now then, is to impart this
innate technique to a automaton, without knowing how the brain does what it does.
I prefer the concise definition of kNN classifier is from Ethem Alpaydin “The k-nn classifier assigns the
input to the class having most examples among the k neighbors of the input – all neighbors have equal
vote and the class having the maximum number of voters among the k-neighbors is chosen and the k-
nearest neighbor classifier assigns an instance to the class most heavily represented among its
neighbors”. The two boundary conditions of k (k=1, the nearest neighbor having the deciding vote) and
(k=n, every instance in the known dataset) are interesting. What impact does it have on the variance
and bias?
Central to k-NN is the notion of distance? How do we compute two distinct things are similar or dis-
similar? Recommendation systems, Clustering (unsupervised learning) besides classification algorithms
use this idea of distance. When the variates are orthogonal (change in one dimension has no impact on
any other dimension) and with identical scale an appropriate version of Minkowski distance may be
used. Euclidean distance in 2-D and the so called Manhattan distance in 1-D are special cases of
Mikowski.
When orthogonality and scale equivalence cannot be assumed, one must use Mahalanobis distance a
multivariate formulation incorporating the covariance matrix of the features.
Lazy Learners and Induction Phase
Given a new (never seen before) observation o, kNN,
compute the distance between o and each data d
i
in the dataset.
Sorts and selects the first k instances,
determines the majority class in those k nearest neighbors
and assigns the majority class to the new observation o
as described there is no transfer of knowledge from the given dataset or prior distance calculations.
Each new observation is processed exactly as outlined.
In comparison, parametric methods, require the parameters to classify new observations and the
parameters are estimated during the learning phase. Whereas the kNN must carry all the known
observations to classify new observation. Everything happens at the time of classifying .
In the RMD we present an implementation and comparison with standard implementation of kNN.
There is no need to split the data into training/testing sets because we are not estimating parameters.
In practice I do split it so that I can determine the performance of the algorithm when presented with
new data. For kNN, the test set serves this purpose. Before the business generates a new observation
we can validate the algorithm for performance.

Más contenido relacionado

La actualidad más candente

Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.ppt
butest
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
butest
 

La actualidad más candente (18)

Intro to modelling-supervised learning
Intro to modelling-supervised learningIntro to modelling-supervised learning
Intro to modelling-supervised learning
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
Dealing with inconsistency
Dealing with inconsistencyDealing with inconsistency
Dealing with inconsistency
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Machine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.pptMachine Learning Applications in NLP.ppt
Machine Learning Applications in NLP.ppt
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Marketing analytics - clustering Types
Marketing analytics - clustering TypesMarketing analytics - clustering Types
Marketing analytics - clustering Types
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
Introduction to Machine Learning.
Introduction to Machine Learning.Introduction to Machine Learning.
Introduction to Machine Learning.
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 

Similar a Chapter 05 k nn

Connectionist and Dynamical Systems approach to Cognition
Connectionist and Dynamical Systems approach to CognitionConnectionist and Dynamical Systems approach to Cognition
Connectionist and Dynamical Systems approach to Cognition
cruzin008
 
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
Khan Mostafa
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
butest
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
butest
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.ppt
butest
 

Similar a Chapter 05 k nn (20)

Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptx
 
Connectionist and Dynamical Systems approach to Cognition
Connectionist and Dynamical Systems approach to CognitionConnectionist and Dynamical Systems approach to Cognition
Connectionist and Dynamical Systems approach to Cognition
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Di35605610
Di35605610Di35605610
Di35605610
 
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment...
 
Classifiers
ClassifiersClassifiers
Classifiers
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
 
learningIntro.doc
learningIntro.doclearningIntro.doc
learningIntro.doc
 
Data analysis05 clustering
Data analysis05 clusteringData analysis05 clustering
Data analysis05 clustering
 
Analysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data SetAnalysis On Classification Techniques In Mammographic Mass Data Set
Analysis On Classification Techniques In Mammographic Mass Data Set
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
A Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its ApplicationsA Review On Genetic Algorithm And Its Applications
A Review On Genetic Algorithm And Its Applications
 
Machine learning
Machine learningMachine learning
Machine learning
 
An Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisAn Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data Analysis
 
introduction to machine learning and nlp
introduction to machine learning and nlpintroduction to machine learning and nlp
introduction to machine learning and nlp
 
coppin chapter 10e.ppt
coppin chapter 10e.pptcoppin chapter 10e.ppt
coppin chapter 10e.ppt
 
Temporal profiles of avalanches on networks
Temporal profiles of avalanches on networksTemporal profiles of avalanches on networks
Temporal profiles of avalanches on networks
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
The Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System ArchitectureThe Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System Architecture
 

Más de Raman Kannan

Más de Raman Kannan (20)

Essays on-civic-responsibilty
Essays on-civic-responsibiltyEssays on-civic-responsibilty
Essays on-civic-responsibilty
 
M12 boosting-part02
M12 boosting-part02M12 boosting-part02
M12 boosting-part02
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
 
M11 bagging loo cv
M11 bagging loo cvM11 bagging loo cv
M11 bagging loo cv
 
M10 gradient descent
M10 gradient descentM10 gradient descent
M10 gradient descent
 
M09-Cross validating-naive-bayes
M09-Cross validating-naive-bayesM09-Cross validating-naive-bayes
M09-Cross validating-naive-bayes
 
M06 tree
M06 treeM06 tree
M06 tree
 
M07 svm
M07 svmM07 svm
M07 svm
 
Chapter 04-discriminant analysis
Chapter 04-discriminant analysisChapter 04-discriminant analysis
Chapter 04-discriminant analysis
 
M03 nb-02
M03 nb-02M03 nb-02
M03 nb-02
 
Augmented 11022020-ieee
Augmented 11022020-ieeeAugmented 11022020-ieee
Augmented 11022020-ieee
 
Chapter 02-logistic regression
Chapter 02-logistic regressionChapter 02-logistic regression
Chapter 02-logistic regression
 
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
Chapter 2: R tutorial Handbook for Data Science and Machine Learning Practiti...
 
A voyage-inward-02
A voyage-inward-02A voyage-inward-02
A voyage-inward-02
 
Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923Evaluating classifierperformance ml-cs6923
Evaluating classifierperformance ml-cs6923
 
A data scientist's study plan
A data scientist's study planA data scientist's study plan
A data scientist's study plan
 
Cognitive Assistants
Cognitive AssistantsCognitive Assistants
Cognitive Assistants
 
Essay on-data-analysis
Essay on-data-analysisEssay on-data-analysis
Essay on-data-analysis
 
Joy of Unix
Joy of UnixJoy of Unix
Joy of Unix
 
How to-run-ols-diagnostics-02
How to-run-ols-diagnostics-02How to-run-ols-diagnostics-02
How to-run-ols-diagnostics-02
 

Último

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 

Último (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

Chapter 05 k nn

  • 1. Chapter – 05 Instance based Classification: kNN Instance based method is one of the non-parametric classification technique. We make numerous decisions each and every day and more often than not they are almost always based on the particular problem and observations we are presented with. Even though we might have an aggregate opinion, that particular decision is based on the data we observe in the immediate without challenging long held beliefs. The immediate percept dominates the decision making process. Thus the instance based process is non parametric because we do not make any assumption about the data, its distribution and hence there are no population parameters that we try to infer using any sample statistic. We do not tend to leverage a model, instead we draw upon past experience under similar situations and try to repeat what we had done before under similar situations. In that sense that model of learning, Jiawei Han et. al. Characterize as lazy learners in contrast to other models we have studied, where in the learner constructs a model and new scenarios are evaluated with that model, and therefore called eager learners. All parametric models are by design eager learners. But not all instance based (non- parametric) are lazy learners. We will study few eager non-parametric learners later in our journey. Nearest Neighbor technique is one such non-parametric lazy learner technique. The intuition behind NN is best illustarted using simple heuristics. Imagine yourself in the Yosemite valley. You hear people having animated conversation. You generally avoid trouble and are you likely to assume that those people you heard were trouble makers or other passionate nature lovers, like yourself? Most people would disregard the acrimonious debate and assume them to be fellow hikers. We have heard the old adage “birds of a feather flock together” or “tell me who your friends are and I will tell you who you are.”. Our brain can effortlessly separate like things from unlike things. Let us take a trip to Africa (photographs from internet)
  • 2. Animals of a kind, hang out together. This is a natural order. Assume we are on a trek on that distant hill. Someone in the group asks hey what is that animal over there? Look at the above pics from the internet, I have blued out a few, can you guess what that animal might be? If you tell me 2 or 3 animals around that animal, I can guess what animal that is? This is the essential idea behind nearest neighbor algorithms, designated kNN. Even a grade school human brain will score high on this test. Why is it innate? How does the brain know what is similar and what is not? How does it settle on a particular k before making a determination? The k specifies the number of neighbors considered by the algorithm to assign a class to the unknown observation. How does the brain perform this task? Algorithms implement using standard similarity algorithms. Please refer to any basic introduction to Similarity and I would recommend https://aiaspirant.com/distance-similarity-measures-in-machine-learning/ for an easy reading. How does kNN work? kNN computes the distance between the given instance and every other instance seen in the past, (aka dataset). kNN then finds the k nearest neighbors, using that distance metric and assigns the majority class (most frequent class of the k nearest neighbors) to the given instance. k is usually 3,5 or some odd number to avoid tie. Thus, by definition there is no training/learning or induction phase in kNN. Given any new instance, kNN iterates over all known instance to compute the distance, finds the k nearest neighbors and assigns the majority class to the new instance. And the keyword here is distance. What is distance between giraffe and elephant. Without any format education how does the human brain compute this distance. All mammalian brains appear to compute if we judge how they tend to remain in groups(aggregation) or pick out a target (discrimination). Our challenge, now then, is to impart this innate technique to a automaton, without knowing how the brain does what it does. I prefer the concise definition of kNN classifier is from Ethem Alpaydin “The k-nn classifier assigns the input to the class having most examples among the k neighbors of the input – all neighbors have equal vote and the class having the maximum number of voters among the k-neighbors is chosen and the k- nearest neighbor classifier assigns an instance to the class most heavily represented among its neighbors”. The two boundary conditions of k (k=1, the nearest neighbor having the deciding vote) and (k=n, every instance in the known dataset) are interesting. What impact does it have on the variance and bias? Central to k-NN is the notion of distance? How do we compute two distinct things are similar or dis- similar? Recommendation systems, Clustering (unsupervised learning) besides classification algorithms use this idea of distance. When the variates are orthogonal (change in one dimension has no impact on
  • 3. any other dimension) and with identical scale an appropriate version of Minkowski distance may be used. Euclidean distance in 2-D and the so called Manhattan distance in 1-D are special cases of Mikowski. When orthogonality and scale equivalence cannot be assumed, one must use Mahalanobis distance a multivariate formulation incorporating the covariance matrix of the features.
  • 4. Lazy Learners and Induction Phase Given a new (never seen before) observation o, kNN, compute the distance between o and each data d i in the dataset. Sorts and selects the first k instances, determines the majority class in those k nearest neighbors and assigns the majority class to the new observation o as described there is no transfer of knowledge from the given dataset or prior distance calculations. Each new observation is processed exactly as outlined. In comparison, parametric methods, require the parameters to classify new observations and the parameters are estimated during the learning phase. Whereas the kNN must carry all the known observations to classify new observation. Everything happens at the time of classifying . In the RMD we present an implementation and comparison with standard implementation of kNN. There is no need to split the data into training/testing sets because we are not estimating parameters. In practice I do split it so that I can determine the performance of the algorithm when presented with new data. For kNN, the test set serves this purpose. Before the business generates a new observation we can validate the algorithm for performance.