SlideShare una empresa de Scribd logo
1 de 34
 Classification is done by relating the unknown to the known according to some
distance/similarity function
 Stores all available cases and classifies new cases based on similarity measure
 Different names
 Memory-based reasoning
 Example-based reasoning
 Instance-based reasoning
 Case-based reasoning
 Lazy learning
 kNN determines the decision boundary locally. Ex. for 1NN we assign each
document to the class of its closest neighbor
 For kNN we assign each document to the majority class of its closest neighbors
where k is a parameter
 The rationale of kNN classification is based on contiguity hypothesis, we expect
the test document to have the same training label as the training documents
located in the local region surrounding the document.
 Veronoi tessellation of a set of objects decomposes space into Voronoi cells, where
each object’s cell consist of all points that are closer to the object than to other
objects.
 It partitions the plane to complex polygons, each containing its corresponding
document.
Let k=3
P(circle class | star) = 1/3
P(X class | star) = 2/3
P(diamond class | star) = 0
3NN estimate is –
P(circle class | star) = 1/3
1NN estimate is –
P(circle class | star) = 1
3NN preferring X class and 1NN
preferring circle class
 Advantages
 Non-parametric architecture
 Simple
 Powerful
 Requires no training time
 Disadvantages
 Memory intensive
 Classification/estimation is slow
 The distance is calculated using Euclidean distance
2
21
2
21 )()( yyxxD 
MinMax
MinX
Xs



2
21
2
21 )()( yyxxD 
MinMax
MinX
Xs



 If k=1, select the nearest neighbors
 If k>1
 For classification, select the most frequent neighbors
 For regression, calculate the average of k neighbors
 An inductive learning task – use particular facts to make more generalized
conclusions
 Predictive model based on branching series of Boolean test – these Boolean test
are less complex than the one-stage classifier
 Its learning from class labeled tuples
 Can be used as visual aid to structure and solve sequential problems
 Internal node (Non-leaf node) denotes a test on an attribute, each branch
represents an outcome of the test and each leaf node holds a class label
If we leave at 10
AM and there are
no cars stalled on
the road, what will
our commute time
be?
Leave At
Stall? Accident?
10 AM 9 AM
8 AM
Long
Long
Short Medium Long
No Yes No Yes
 In this decision tree, we made a series of Boolean decision and followed a
corresponding branch –
 Did we leave at 10AM?
 Did the car stall on road?
 Is there an accident on the road?
 By answering each of these questions as yes or no, we can come to a conclusion on
how long our commute might take
 We do not have to represent this tree graphically
 We can represent this as a set of rules. However, it may be harder to read
if hour == 8am
commute time = long
else if hour == 9am
if accident == yes
commute time = long
else
commute time = medium
else if hour == 10am
if stall == yes
commute time = long
else
commute time = short
 The algorithm is called with three parameters – data partition, attribute list,
attribute subset selection.
 It’s a set of tuples and there associated class label
 Attribute list is a list of attributes describing the tuples
 Attribute selection method specifies a heuristic procedure for selecting attribute that
best discriminates the tuples
 Tree starts at node N. if all the tuples in D are of the same class, then node N
becomes a leaf and is labelled with that class
 Else attribute selection method is used to determine the splitting criteria.
 Node N is labelled with splitting criteria, which serves as a test at the node.
 The previous experience decision table showed 4 attributes – hour, weather,
accident and stall
 But the decision tree showed three attributes – hour, attribute and stall
 So which attribute is to be kept and which is to be removed?
 Methods for selecting attribute shows that weather is not a discriminating
attribute
 Method – given a number of competing hypothesis, the simplest one is preferable
 We will focus on ID3 algorithm
 Basic idea
 Choose the best attribute to split the remaining instances and make that attribute a
decision node
 Repeat this process for recursively for each child
 Stop when
 All attribute have same target attribute value
 There are no more attributes
 There are no more instances
 ID3 splits attributes based on their entropy.
 Entropy is a measure of disinformation
 Entropy is minimized when all values of target attribute are the same
 If we know that the commute time will be short, the entropy=0
 Entropy is maximized when there is an equal chance of values for the target
attribute (i.e. result is random)
 If commute time = short in 3 instances, medium in 3 instances and long in 3 instances,
entropy is maximized
 Calculation of entropy
 Entropy S = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)
 S = set of examples
 Si = subset of S with value vi under the target attribute
 L – size of range of target attribute
 If we break down the leaving time to the minute, we might get something like this
 Since the entropy is very less for each branch and we have n branches with n
leaves. This would not be helpful for predictive modelling
 We use a technique called as discretization. We choose cut point such as 9AM for
splitting continuous attributes
8:02 AM 10:02 AM8:03 AM 9:09 AM9:05 AM 9:07 AM
Long Medium Short Long Long Short
 Consider the attribute commute time
 When we split the attribute, we increase the entropy so we don’t have a decision
tree with the same number of cut points as leaves
8:00 (L), 8:02 (L), 8:07 (M), 9:00 (S), 9:20 (S), 9:25 (S), 10:00 (S), 10:02 (M)
 Binary decision trees
 Classification of an input vector is done by traversing the tree beginning at the root node
and ending at the leaf
 Each node of the tree computes an inequality
 Each leaf is assigned to a particular class
 Input space is based on one input variable
 Each node draws a boundary that can be geometrically interpreted as a hyperplane
perpendicular to the axis
B C
Yes No
Yes No
NoYes
BMI<24
 They are similar to binary tree
 Inequality computed at each node takes on a linear form that may depend on
linear variable
aX1+bX2
Yes No
Yes No
NoYes
 Chi-squared automatic intersection detector(CHAID)
 Non-binary decision tree
 Decision made at each node is based on single variable, but can result in multiple
branches
 Continuous variables are grouped into a finite number of bins to create categories
 Equal population bins is created for CHAID
 Classification and Regression Trees (CART) are binary decision trees which split a
single variable at each node
 The CART algorithm goes through an exhaustive search of all variables and split values
to find the optimal splitting rule for each node.
 There is another technique for reducing the number of attributes used in tree –
pruning
 Two types of pruning
 Pre-pruning (forward pruning)
 Post-pruning (backward pruning)
 Pre-pruning
 We decide during the building process, when to stop adding attributes (possibly based on
their information gain)
 However, this may be problematic – why?
 Sometimes, attribute individually do not compute much to a decision, but combined they may
have significant impact.
 Post-pruning waits until full decision tree has been built and then prunes the
attributes.
 Two techniques:
 Subtree replacement
 Subtree raising
 Subtree replacement
A
B
C
1 2 3
4 5
 Node 6 replaced the subtree
 May increase accuracy
A
B
6 4 5
 Entire subtree is raised onto another node
A
B
C
1 2 3
4 5
A
C
1 2 3
 While decision tree classifies quickly, the time taken for building the tree may be
higher than any other type of classifier.
 Decision tree suffer from problem of error propagation throughout the tree
 Since decision trees work by a series of local decision, what happens if one of these
decision is wrong?
 Every decision from that point on may be wrong
 We may return to the correct path of the tree

Más contenido relacionado

La actualidad más candente

Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
butest
 

La actualidad más candente (20)

Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
 
Random forest
Random forestRandom forest
Random forest
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Random forest
Random forestRandom forest
Random forest
 
Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Decision tree
Decision treeDecision tree
Decision tree
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 

Similar a K nearest neighbor

Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
guru_prasadg
 
Advanced data structures using c++ 3
Advanced data structures using c++ 3Advanced data structures using c++ 3
Advanced data structures using c++ 3
Shaili Choudhary
 

Similar a K nearest neighbor (20)

BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 
Decision tree
Decision treeDecision tree
Decision tree
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Lect4
Lect4Lect4
Lect4
 
Decision trees
Decision treesDecision trees
Decision trees
 
Chap8 basic cluster_analysis
Chap8 basic cluster_analysisChap8 basic cluster_analysis
Chap8 basic cluster_analysis
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Clustering
ClusteringClustering
Clustering
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
 
Advanced data structures using c++ 3
Advanced data structures using c++ 3Advanced data structures using c++ 3
Advanced data structures using c++ 3
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
[PPT]
[PPT][PPT]
[PPT]
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clustering
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 

Más de Ujjawal (9)

fMRI in machine learning
fMRI in machine learningfMRI in machine learning
fMRI in machine learning
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learning
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Vector space classification
Vector space classificationVector space classification
Vector space classification
 
Scoring, term weighting and the vector space
Scoring, term weighting and the vector spaceScoring, term weighting and the vector space
Scoring, term weighting and the vector space
 
Bayes’ theorem and logistic regression
Bayes’ theorem and logistic regressionBayes’ theorem and logistic regression
Bayes’ theorem and logistic regression
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 

Último

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 

Último (20)

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

K nearest neighbor

  • 1.
  • 2.  Classification is done by relating the unknown to the known according to some distance/similarity function  Stores all available cases and classifies new cases based on similarity measure  Different names  Memory-based reasoning  Example-based reasoning  Instance-based reasoning  Case-based reasoning  Lazy learning
  • 3.  kNN determines the decision boundary locally. Ex. for 1NN we assign each document to the class of its closest neighbor  For kNN we assign each document to the majority class of its closest neighbors where k is a parameter  The rationale of kNN classification is based on contiguity hypothesis, we expect the test document to have the same training label as the training documents located in the local region surrounding the document.  Veronoi tessellation of a set of objects decomposes space into Voronoi cells, where each object’s cell consist of all points that are closer to the object than to other objects.  It partitions the plane to complex polygons, each containing its corresponding document.
  • 4. Let k=3 P(circle class | star) = 1/3 P(X class | star) = 2/3 P(diamond class | star) = 0 3NN estimate is – P(circle class | star) = 1/3 1NN estimate is – P(circle class | star) = 1 3NN preferring X class and 1NN preferring circle class
  • 5.  Advantages  Non-parametric architecture  Simple  Powerful  Requires no training time  Disadvantages  Memory intensive  Classification/estimation is slow  The distance is calculated using Euclidean distance
  • 6. 2 21 2 21 )()( yyxxD 
  • 8. 2 21 2 21 )()( yyxxD 
  • 10.  If k=1, select the nearest neighbors  If k>1  For classification, select the most frequent neighbors  For regression, calculate the average of k neighbors
  • 11.
  • 12.  An inductive learning task – use particular facts to make more generalized conclusions  Predictive model based on branching series of Boolean test – these Boolean test are less complex than the one-stage classifier  Its learning from class labeled tuples  Can be used as visual aid to structure and solve sequential problems  Internal node (Non-leaf node) denotes a test on an attribute, each branch represents an outcome of the test and each leaf node holds a class label
  • 13. If we leave at 10 AM and there are no cars stalled on the road, what will our commute time be? Leave At Stall? Accident? 10 AM 9 AM 8 AM Long Long Short Medium Long No Yes No Yes
  • 14.  In this decision tree, we made a series of Boolean decision and followed a corresponding branch –  Did we leave at 10AM?  Did the car stall on road?  Is there an accident on the road?  By answering each of these questions as yes or no, we can come to a conclusion on how long our commute might take
  • 15.  We do not have to represent this tree graphically  We can represent this as a set of rules. However, it may be harder to read if hour == 8am commute time = long else if hour == 9am if accident == yes commute time = long else commute time = medium else if hour == 10am if stall == yes commute time = long else commute time = short
  • 16.  The algorithm is called with three parameters – data partition, attribute list, attribute subset selection.  It’s a set of tuples and there associated class label  Attribute list is a list of attributes describing the tuples  Attribute selection method specifies a heuristic procedure for selecting attribute that best discriminates the tuples  Tree starts at node N. if all the tuples in D are of the same class, then node N becomes a leaf and is labelled with that class  Else attribute selection method is used to determine the splitting criteria.  Node N is labelled with splitting criteria, which serves as a test at the node.
  • 17.
  • 18.
  • 19.  The previous experience decision table showed 4 attributes – hour, weather, accident and stall  But the decision tree showed three attributes – hour, attribute and stall  So which attribute is to be kept and which is to be removed?  Methods for selecting attribute shows that weather is not a discriminating attribute  Method – given a number of competing hypothesis, the simplest one is preferable  We will focus on ID3 algorithm
  • 20.  Basic idea  Choose the best attribute to split the remaining instances and make that attribute a decision node  Repeat this process for recursively for each child  Stop when  All attribute have same target attribute value  There are no more attributes  There are no more instances  ID3 splits attributes based on their entropy.  Entropy is a measure of disinformation
  • 21.  Entropy is minimized when all values of target attribute are the same  If we know that the commute time will be short, the entropy=0  Entropy is maximized when there is an equal chance of values for the target attribute (i.e. result is random)  If commute time = short in 3 instances, medium in 3 instances and long in 3 instances, entropy is maximized  Calculation of entropy  Entropy S = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)  S = set of examples  Si = subset of S with value vi under the target attribute  L – size of range of target attribute
  • 22.
  • 23.  If we break down the leaving time to the minute, we might get something like this  Since the entropy is very less for each branch and we have n branches with n leaves. This would not be helpful for predictive modelling  We use a technique called as discretization. We choose cut point such as 9AM for splitting continuous attributes 8:02 AM 10:02 AM8:03 AM 9:09 AM9:05 AM 9:07 AM Long Medium Short Long Long Short
  • 24.  Consider the attribute commute time  When we split the attribute, we increase the entropy so we don’t have a decision tree with the same number of cut points as leaves 8:00 (L), 8:02 (L), 8:07 (M), 9:00 (S), 9:20 (S), 9:25 (S), 10:00 (S), 10:02 (M)
  • 25.  Binary decision trees  Classification of an input vector is done by traversing the tree beginning at the root node and ending at the leaf  Each node of the tree computes an inequality  Each leaf is assigned to a particular class  Input space is based on one input variable  Each node draws a boundary that can be geometrically interpreted as a hyperplane perpendicular to the axis
  • 26. B C Yes No Yes No NoYes BMI<24
  • 27.  They are similar to binary tree  Inequality computed at each node takes on a linear form that may depend on linear variable aX1+bX2 Yes No Yes No NoYes
  • 28.  Chi-squared automatic intersection detector(CHAID)  Non-binary decision tree  Decision made at each node is based on single variable, but can result in multiple branches  Continuous variables are grouped into a finite number of bins to create categories  Equal population bins is created for CHAID  Classification and Regression Trees (CART) are binary decision trees which split a single variable at each node  The CART algorithm goes through an exhaustive search of all variables and split values to find the optimal splitting rule for each node.
  • 29.  There is another technique for reducing the number of attributes used in tree – pruning  Two types of pruning  Pre-pruning (forward pruning)  Post-pruning (backward pruning)  Pre-pruning  We decide during the building process, when to stop adding attributes (possibly based on their information gain)  However, this may be problematic – why?  Sometimes, attribute individually do not compute much to a decision, but combined they may have significant impact.
  • 30.  Post-pruning waits until full decision tree has been built and then prunes the attributes.  Two techniques:  Subtree replacement  Subtree raising  Subtree replacement A B C 1 2 3 4 5
  • 31.  Node 6 replaced the subtree  May increase accuracy A B 6 4 5
  • 32.  Entire subtree is raised onto another node A B C 1 2 3 4 5 A C 1 2 3
  • 33.  While decision tree classifies quickly, the time taken for building the tree may be higher than any other type of classifier.  Decision tree suffer from problem of error propagation throughout the tree
  • 34.  Since decision trees work by a series of local decision, what happens if one of these decision is wrong?  Every decision from that point on may be wrong  We may return to the correct path of the tree