SlideShare una empresa de Scribd logo
1 de 52
Nearest Neighbor &
Bayesian Classifiers
Machine Learning and Data Mining (Unit 13)




Prof. Pier Luca Lanzi
References                                    2



  Jiawei Han and Micheline Kamber, "Data Mining, : Concepts
  and Techniques", The Morgan Kaufmann Series in Data
  Management Systems (Second Edition).
  Tom M. Mitchell. “Machine Learning” McGraw Hill 1997
  Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
  “Introduction to Data Mining”, Addison Wesley.
  Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine
  Learning Tools and Techniques with Java Implementations”
  2nd Edition.




                  Prof. Pier Luca Lanzi
Apples again!                                3




                                        Is this an apple?




                Prof. Pier Luca Lanzi
A Simple Way to Classify a New Fruit           4



  To decide whether an unseen fruit is an apple, consider
  the k fruits that are more similar to the unknown fruit
  Then, classify the unknown fruit using the most
  frequent class




                  Prof. Pier Luca Lanzi
A Simple Way to Classify a New Fruit             5




  If k=5, these 5 fruits are the most similar ones
  to the unclassified example
  Since, the majority of the fruits are apples, we decide that
  the unknown fruit is an apple

                   Prof. Pier Luca Lanzi
Instance-Based Classifiers                        6



  Store the training records
  Use the training records to predict
  the class label of unseen cases

  Att1    …     Attn        Class




                                           Att1       …   Attn




                   Prof. Pier Luca Lanzi
Instance-based methods                           7



  Instance-based methods are the simplest form of learning

  Training instances are searched for instance that most
  closely resembles new instance

  The instances themselves represent the knowledge

  Similarity (distance) function defines what’s “learned”

  “lazy evaluation”, until a new instance must be classified

  Instance-based learning is lazy learning




                   Prof. Pier Luca Lanzi
Instance Based Classifiers                        8



  Rote-learner
     Memorizes entire training data
     Performs classification only if attributes of record
     match one of the training examples exactly

  Nearest neighbor
    Uses k “closest” points (nearest neighbors)
    for performing classification




                   Prof. Pier Luca Lanzi
Rote learning                                 9



  Rote Learning is basically memorization
     Saving knowledge so it can be used again
     Retrieval is the only problem
     No repeated computation, inference or query is necessary

  A simple example of rote learning is caching
     Store computed values (or large piece of data)
     Recall this information when required by computation
     Significant time savings can be achieved
     Many AI programs (as well as more general ones) have
     used caching very effectively




                  Prof. Pier Luca Lanzi
Nearest Neighbor Classifiers                             10



    Basic idea:
       If it walks like a duck, quacks like a duck, then it’s
       probably a duck




                                              Compute
                                                                   Test
                                              Distance
                                                                  Record




Training                                      Choose k of the
Records                                       “nearest” records



                      Prof. Pier Luca Lanzi
Nearest-Neighbor Classifiers                              11



                                           Requires three things
        Unknown record
                                               The set of stored records
                                               Distance Metric to compute
                                               distance between records
                                               The value of k, the number
                                               of nearest neighbors to
                                               retrieve

                                           To classify an unknown record:
                                                Compute distance to other
                                                training records
                                                Identify k nearest neighbors
                                                Use class labels of nearest
                                                neighbors to determine the
                                                class label of unknown
                                                record (e.g., by taking
                                                majority vote)



                   Prof. Pier Luca Lanzi
Definition of Nearest Neighbor                              12




                                           X
           X                                                     X




(a) 1-nearest neighbor      (b) 2-nearest neighbor   (c) 3-nearest neighbor


     K-nearest neighbors of a record x are data
     points that have the k smallest distance to x

                         Prof. Pier Luca Lanzi
1 nearest-neighbor                        13




Voronoi Diagram




                  Prof. Pier Luca Lanzi
What distance?                                     14



  To compute the similarity between two points, use for instance the
  Euclidean distance




  Determine the class from nearest neighbor list
     take the majority vote of class labels among the k-nearest
     neighbors
     Weigh the vote according to distance, e.g.,
     the weight factor, w = 1/d2

  Taking the square root is not required when comparing distances
  Another popular metric is city-block (Manhattan) metric,
  distance is the sum of absolute differences



                    Prof. Pier Luca Lanzi
Normalization and other issues                               15



   Different attributes are measured on different scales
   Attributes need to be normalized:

                  vi − min vi                         vi − Avg ( vi )
          ai =                                 ai =
                 max vi − min vi                       StDev ( vi )

   vi : the actual value of attribute i

   Nominal attributes: distance either 0 or 1

   Common policy for missing values: assumed to be maximally
   distant (given normalized attributes)




                       Prof. Pier Luca Lanzi
Nearest Neighbor Classification…                   16



  Choosing the value of k:
     If k is too small, sensitive to noise points
     If k is too large, neighborhood may include points from other
     classes




                    Prof. Pier Luca Lanzi
Nearest Neighbor Classification:             17
Scaling Issues

  Attributes may have to be scaled to prevent distance
  measures from being dominated by one of the attributes
  Example:
     height of a person may vary from 1.5m to 1.8m
     weight of a person may vary from 90lb to 300lb
     income of a person may vary from $10K to $1M




                  Prof. Pier Luca Lanzi
Nearest Neighbor Classification                       18



  Problem with Euclidean measure:
     High dimensional data, curse of dimensionality
     Can produce counter-intuitive results




111111111110                                100000000000
                                vs
011111111111                                000000000001
     d = 1.4142                                d = 1.4142


      • Solution: Normalize the vectors to unit length


                    Prof. Pier Luca Lanzi
Nearest neighbor Classification                19



  k-NN classifiers are lazy learners
     It does not build models explicitly
     Unlike eager learners such as decision tree induction and
     rule-based systems
     Classifying unknown records are relatively expensive




                  Prof. Pier Luca Lanzi
The distance function                            20



  Simplest case: one numeric attribute
     Distance is the difference between the two attribute
     values involved (or a function thereof)

  Several numeric attributes: normally, Euclidean distance is
  used and attributes are normalized

  Nominal attributes: distance is set to 1 if values are
  different, 0 if they are equal

  Are all attributes equally important?
     Weighting the attributes might be necessary




                   Prof. Pier Luca Lanzi
Discussion of Nearest-neighbor                       21



  Often very accurate
  But slow, since simple version scans entire training data to derive a
  prediction
  Assumes all attributes are equally important, thus may need
  attribute selection or weights
  Possible remedies against noisy instances:
      Take a majority vote over the k nearest neighbors
      Removing noisy instances from dataset (difficult!)
  Statisticians have used k-NN since early 1950s
      If n → ∞ and k/n → 0, error approaches minimum




                     Prof. Pier Luca Lanzi
Case-Based Reasoning (CBR)                     22



  Also uses and lazy evaluation and analyzes similar instances

  But instances are not “points in a Euclidean space”

  Methodology
    Instances represented by rich symbolic descriptions (e.g.,
    function graphs)
    Multiple retrieved cases may be combined
    Tight coupling between case retrieval,
    knowledge-based reasoning, and problem solving

  Research issues
     Indexing based on syntactic similarity measure, and
     when failure, backtracking, and adapting to
     additional cases

                   Prof. Pier Luca Lanzi
Lazy Learning vs. Eager Learning              23



  Instance-based learning is based on lazy evaluation
  Decision-tree and Bayesian classification are
  based on eager evaluation
  Key differences
     Lazy method may consider query instance xq when
     deciding how to generalize beyond the training data D
     Eager method cannot since they have already chosen
     global approximation when seeing the query




                  Prof. Pier Luca Lanzi
Lazy Learning vs. Eager Learning              24



  Efficiency
      Lazy learning needs less time for training
      but more time predicting
  Accuracy
      Lazy methods effectively use a richer hypothesis space
      since it uses many local linear functions to form its
      implicit global approximation to the target function
      Eager methods must commit to a single hypothesis that
      covers the entire instance space




                  Prof. Pier Luca Lanzi
Example: PEBLS                               25



  PEBLS: Parallel Examplar-Based Learning System
  (Cost & Salzberg)
  Works with both continuous and nominal features
  For nominal features, distance between two nominal values
  is computed using modified value difference metric
  Each record is assigned a weight factor
  Number of nearest neighbor, k = 1




                  Prof. Pier Luca Lanzi
Example: PEBLS                                                                        26



      Tid Refund Marital          Taxable
                                                        Distance between nominal attribute values:
                                  Income Cheat
                 Status
                                                        d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1
                                            No
      1    Yes       Single       125K
                                                        d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0
                                            No
      2    No        Married      100K
                                                        d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1
                                            No
      3    No        Single       70K
                                                        d(Refund=Yes,Refund=No) =
                                            No
      4    Yes       Married      120K
                                                        | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7
                                            Yes
      5    No        Divorced 95K
                                            No
      6    No        Married      60K

                                                                                             n1i n 2 i
                                                                                    ∑
                                            No
      7    Yes       Divorced 220K
                                                                    d (V1 ,V2 ) =               −
                                            Yes
      8    No        Single       85K
                                                                                             n1 n 2
                                                                                     i
                                            No
      9    No        Married      75K
                                            Yes
      10   No        Single       90K
 10




                                                                                         Refund
                      Marital Status                                      Class
Class                                                                               Yes           No
            Single      Married          Divorced
                                                                           Yes           0        3
Yes              2            0             1
                                                                            No           3        4
 No              2            4             1


                                            Prof. Pier Luca Lanzi
Example: PEBLS                                                      27



                      Tid Refund Marital          Taxable
                                                  Income Cheat
                                 Status

                       X   Yes       Single       125K   No
                       Y   No        Married      100K   No
                 10




Distance between record X and record Y:
                                                  d
           Δ ( X , Y ) = w X wY ∑ d ( X i , Yi )                 2

                                                 i =1

                Number of times X is used for prediction
where:
           wX =
                 Number of times X predicts correctly

         wX ≅ 1 if X makes accurate prediction most of the time

         wX > 1 if X is not reliable for making predictions

                                 Prof. Pier Luca Lanzi
Bayes Classifier                                 28



  A probabilistic framework for solving classification problems
  Conditional Probability:




  Bayes theorem,




  A priori probability of C: P(C)
  Probability of event before evidence is seen
  A posteriori probability of C: P(C|A)
  Probability of event after evidence is seen


                   Prof. Pier Luca Lanzi
Naïve Bayes for classification                       29



   What’s the probability of the class given an instance?

   Evidence E = instance, represented as a tuple of attributes
   <e1, …, en>

   Event H = class value for instance

   We are looking for the class value with
   the highest probability for E

   I.e., we are looking for the hypothesis that has the highest
   probability to explain the evidence E




                     Prof. Pier Luca Lanzi
Naïve Bayes for classification                       30



   Given the hypothesis H and the example E described
   by n attributes, Bayes Theorem says that




   Naïve assumption: attributes are statistically independent

   Evidence splits into parts that are independent




                    Prof. Pier Luca Lanzi
Naïve Bayes for classification                  31



  Training: the class probability P(H) and the conditional
  probability P(ei|H) for each attribute value ei and
  each class value H

  Testing: given an instance E, the class is computed as




                   Prof. Pier Luca Lanzi
Bayesian (Statistical) modeling                32



  “Opposite” of OneRule: use all the attributes
  Two assumptions
      Attributes are equally important
      Attribute are statistically independent
  I.e., knowing the value of one attribute says nothing about
  the value of another (if the class is known)
  Independence assumption is almost never correct!
  But this scheme works well in practice




                  Prof. Pier Luca Lanzi
Using the Probabilities for Classification                                      33



        Outlook           Temperature                  Humidity                    Windy               Play
           Yes    No            Yes      No                Yes    No               Yes     No    Yes     No
Sunny       2      3    Hot      2       2      High        3      4    False        6      2     9           5
Overcast    4      0    Mild     4       2      Normal      6      1    True         3      3

Rainy       3      2    Cool     3       1
Sunny      2/9    3/5   Hot     2/9    2/5      High       3/9    4/5   False       6/9    2/5   9/14     5/14
Overcast   4/9    0/5   Mild    4/9    2/5      Normal     6/9    1/5   True        3/9    3/5
Rainy      3/9    2/5   Cool    3/9    1/5




                               Outlook        Temp.    Humidity   Windy    Play
                               Sunny          Cool        High     True        ?




                  What is the value of “play”?

                                  Prof. Pier Luca Lanzi
Using the Probabilities for Classification                        34



                 Outlook     Temp.      Humidity   Windy   Play
                 Sunny        Cool          High   True     ?




Conversion into a probability by normalization:




                    Prof. Pier Luca Lanzi
The “zero-frequency problem”                   35



  What if an attribute value does not occur
  with every class value?

  For instance, “Humidity = high” for class “yes”
     Probability will be zero!


     A posteriori probability will also be zero!
     (No matter how likely the other values are!)


  Remedy: add 1 to the count for every attribute value-class
  combination (Laplace estimator)
  Result: probabilities will never be zero!
  (also: stabilizes probability estimates)



                   Prof. Pier Luca Lanzi
M-Estimate of Conditional Probability              36



   n number of examples with class H
   nc number of examples with class H and attribute e
   The m-estimate for P(e|H) is computed as,




   Where m is a parameter known as the equivalent sample size
   p is a user specified parameter
         If n is zero, then P(e|H) = p, thus p is
         the prior probability of observing e
         p =1/k given k values of the attribute considered




                    Prof. Pier Luca Lanzi
Missing values                                               37



  Training: instance is not included in frequency count for
  attribute value-class combination
  Testing: attribute will be omitted from calculation
  Example:

            Outlook   Temp.       Humidity    Windy   Play
               ?       Cool          High     True     ?



      Likelihood of “yes” = 3/9 × 3/9 × 3/9 × 9/14 = 0.0238
      Likelihood of “no” = 1/5 × 4/5 × 3/5 × 5/14 = 0.0343
      P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41%
      P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%




                      Prof. Pier Luca Lanzi
Numeric attributes                           38



  Usual assumption: attributes have a normal or Gaussian
  probability distribution (given the class)




                                           Karl Gauss, 1777-1855
                                           great German mathematician
                  Prof. Pier Luca Lanzi
Numeric attributes                               39



  Usual assumption: attributes have a normal or Gaussian
  probability distribution (given the class)
  The probability density function for the normal distribution is
  defined by two parameters:
     Sample mean,




     Standard deviation,




     Then the density function f(x) is



                   Prof. Pier Luca Lanzi
Statistics for Weather data                                                  40
  with numeric temperature



        Outlook          Temperature                   Humidity                Windy              Play
           Yes    No     Yes          No           Yes            No            Yes    No    Yes         No
Sunny       2     3     64, 68,    65, 71,      65, 70,      70, 85, False       6     2      9          5
Overcast    4     0     69, 70,    72, 80,      70, 75,      90, 91, True        3     3
Rainy       3     2     72, …       85, …        80, …       95, …
                        μ =73       μ =75        μ =79       μ =86
Sunny      2/9    3/5                                                  False    6/9    2/5   9/14 5/14
                        σ =6.2     σ =7.9       σ =10.2      σ =9.7 True
Overcast   4/9    0/5                                                           3/9    3/5
Rainy      3/9    2/5


    Example density value:




                               Prof. Pier Luca Lanzi
Classifying a new day                                        41



  A new day,

            Outlook    Temp.      Humidity    Windy   Play

             Sunny       66           90      true     ?



  Missing values during training are not included in calculation
  of mean and standard deviation

   Likelihood of “yes” = 2/9 × 0.0340 × 0.0221 × 3/9 × 9/14 = 0.000036
   Likelihood of “no” = 3/5 × 0.0291 × 0.0380 × 3/5 × 5/14 = 0.000136
   P(“yes”) = 0.000036 / (0.000036 + 0. 000136) = 20.9%
   P(“no”) = 0.000136 / (0.000036 + 0. 000136) = 79.1%




                      Prof. Pier Luca Lanzi
Naïve Bayes: discussion                       42



  Naïve Bayes works surprisingly well
  (even if independence assumption is clearly violated)
  Why? Because classification doesn’t require accurate
  probability estimates as long as maximum probability is
  assigned to correct class
  However: adding too many redundant attributes will cause
  problems (e.g. identical attributes)
  Also, many numeric attributes are not normally distributed




                  Prof. Pier Luca Lanzi
Bayesian Belief Networks                         43



  The conditional independence assumption,
     makes computation possible
     yields optimal classifiers when satisfied
     but is seldom satisfied in practice, as attributes
     (variables) are often correlated

  Bayesian Belief Networks (BBN) allows us to specify which
  pair of attributes are conditionally independent

  They provide a graphical representation of probabilistic
  relationships among a set of random variables




                   Prof. Pier Luca Lanzi
Bayesian Belief Networks                       44



  Describe the probability distribution governing a set of
  variables by specifying
     Conditional independence assumptions that apply on
     subsets of the variables
     A set of conditional probabilities

  Two key elements
    A direct acyclic graph, encoding the dependence
    relationships among variables
    A probability table associating each node to its immediate
    parents node




                  Prof. Pier Luca Lanzi
Examples of Bayesian Belief Networks          45




a) Variable A and B are independent, but each one of them has
   an influence on the variable C
b) A is conditionally independent of both B and D, given C
c) The configuration of the typical naïve Bayes classifier




                  Prof. Pier Luca Lanzi
Probability Tables                              46



  The network topology imposes conditions regarding the
  variable conditional independence
  Each node is associated with a probability table
     If a node X does not have any parents, then
     the table contains only the prior probability P(X)
     If a node X has only one any parent Y, then
     the table contains only the conditional probability P(X|Y)
     If a node X has multiple parents, Y1, …, Yk the
     the table contains the conditional probability P(X|Y1…Yk)




                     Prof. Pier Luca Lanzi
47




Prof. Pier Luca Lanzi
Model Building                           48




                 Prof. Pier Luca Lanzi
Model Building                                  49



  Let T = {X1, X2, …, Xd} denote a total
  order of the variables
  for j = 1 to d do
      Let XT(j) denote the jth highest order variable in T
      Let π(XT(j)) = {XT(1) , XT(j-1)} denote the variables
      preceding XT(j)
      Remove the variables from π(XT(j)) that do not affect XT(j)
      Create an arc between XT(j) and the remaining variables in
      π(XT(j))
  done

  Guarantees a topology that does not contain cycles
  Different orderings generate different networks
  In principles d! possible networks to test


                   Prof. Pier Luca Lanzi
Bayesian Belief Networks                      50




                                              evidence, observed

                                              unobserved

                                              to be predicted




  In general the inference is NP-complete but there are
  approximating methods, e.g. Monte-Carlo


                  Prof. Pier Luca Lanzi
Bayesian Belief Networks                       51



  Bayesian belief network allows a subset of the variables
  conditionally independent
  A graphical model of causal relationships
  Several cases of learning Bayesian belief networks
     Given both network structure and
     all the variables is easy
     Given network structure but only some variables
     When the network structure is not known in advance




                  Prof. Pier Luca Lanzi
Summary of Bayesian Belief Networks          52



  BBN provides an approach for capturing prior knowledge of a
  particular domain using a graphical model
  Building the network is time consuming, but
  adding variables to a network is straightforward
  They can encode causal dependencies among variables
  They are well suited to dealing with incomplete data
  They are robust to overfitting




                  Prof. Pier Luca Lanzi

Más contenido relacionado

La actualidad más candente

Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Preparing your data for Machine Learning with Feature Scaling
Preparing your data for  Machine Learning with Feature ScalingPreparing your data for  Machine Learning with Feature Scaling
Preparing your data for Machine Learning with Feature ScalingRahul K Chauhan
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature EngineeringAlice Zheng
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - IntroductionMarina Santini
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Rehan Guha
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoostJoonyoung Yi
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Simplilearn
 

La actualidad más candente (20)

Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Preparing your data for Machine Learning with Feature Scaling
Preparing your data for  Machine Learning with Feature ScalingPreparing your data for  Machine Learning with Feature Scaling
Preparing your data for Machine Learning with Feature Scaling
 
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Lecture 01: Machine Learning for Language Technology - Introduction
 Lecture 01: Machine Learning for Language Technology - Introduction Lecture 01: Machine Learning for Language Technology - Introduction
Lecture 01: Machine Learning for Language Technology - Introduction
 
Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)Parametric & Non-Parametric Machine Learning (Supervised ML)
Parametric & Non-Parametric Machine Learning (Supervised ML)
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Data Augmentation
Data AugmentationData Augmentation
Data Augmentation
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
 

Similar a Naive Bayes and KNN Classifiers Explained

09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdfArafathJazeeb1
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringPier Luca Lanzi
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Marina Santini
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsPier Luca Lanzi
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptxdfgd7
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringPier Luca Lanzi
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
instance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.pptinstance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.pptJohny139575
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 

Similar a Naive Bayes and KNN Classifiers Explained (20)

09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to Clustering
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based Clustering
 
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
Lecture 02: Machine Learning for Language Technology - Decision Trees and Nea...
 
DMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethodsDMTM Lecture 09 Other classificationmethods
DMTM Lecture 09 Other classificationmethods
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
KNN.pptx
KNN.pptxKNN.pptx
KNN.pptx
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other MethodsDMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
DMTM 2015 - 13 Naive bayes, Nearest Neighbours and Other Methods
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical Clustering
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Knn 160904075605-converted
Knn 160904075605-convertedKnn 160904075605-converted
Knn 160904075605-converted
 
instance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.pptinstance bases k nearest neighbor algorithm.ppt
instance bases k nearest neighbor algorithm.ppt
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
KNN presentation.pdf
KNN presentation.pdfKNN presentation.pdf
KNN presentation.pdf
 

Más de Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesPier Luca Lanzi
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesPier Luca Lanzi
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 
DMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationDMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationPier Luca Lanzi
 

Más de Pier Luca Lanzi (20)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 16 Association rules
DMTM Lecture 16 Association rulesDMTM Lecture 16 Association rules
DMTM Lecture 16 Association rules
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rulesDMTM Lecture 08 Classification rules
DMTM Lecture 08 Classification rules
 
DMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision treesDMTM Lecture 07 Decision trees
DMTM Lecture 07 Decision trees
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 
DMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationDMTM Lecture 04 Classification
DMTM Lecture 04 Classification
 

Último

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Naive Bayes and KNN Classifiers Explained

  • 1. Nearest Neighbor & Bayesian Classifiers Machine Learning and Data Mining (Unit 13) Prof. Pier Luca Lanzi
  • 2. References 2 Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann Series in Data Management Systems (Second Edition). Tom M. Mitchell. “Machine Learning” McGraw Hill 1997 Pang-Ning Tan, Michael Steinbach, Vipin Kumar, “Introduction to Data Mining”, Addison Wesley. Ian H. Witten, Eibe Frank. “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations” 2nd Edition. Prof. Pier Luca Lanzi
  • 3. Apples again! 3 Is this an apple? Prof. Pier Luca Lanzi
  • 4. A Simple Way to Classify a New Fruit 4 To decide whether an unseen fruit is an apple, consider the k fruits that are more similar to the unknown fruit Then, classify the unknown fruit using the most frequent class Prof. Pier Luca Lanzi
  • 5. A Simple Way to Classify a New Fruit 5 If k=5, these 5 fruits are the most similar ones to the unclassified example Since, the majority of the fruits are apples, we decide that the unknown fruit is an apple Prof. Pier Luca Lanzi
  • 6. Instance-Based Classifiers 6 Store the training records Use the training records to predict the class label of unseen cases Att1 … Attn Class Att1 … Attn Prof. Pier Luca Lanzi
  • 7. Instance-based methods 7 Instance-based methods are the simplest form of learning Training instances are searched for instance that most closely resembles new instance The instances themselves represent the knowledge Similarity (distance) function defines what’s “learned” “lazy evaluation”, until a new instance must be classified Instance-based learning is lazy learning Prof. Pier Luca Lanzi
  • 8. Instance Based Classifiers 8 Rote-learner Memorizes entire training data Performs classification only if attributes of record match one of the training examples exactly Nearest neighbor Uses k “closest” points (nearest neighbors) for performing classification Prof. Pier Luca Lanzi
  • 9. Rote learning 9 Rote Learning is basically memorization Saving knowledge so it can be used again Retrieval is the only problem No repeated computation, inference or query is necessary A simple example of rote learning is caching Store computed values (or large piece of data) Recall this information when required by computation Significant time savings can be achieved Many AI programs (as well as more general ones) have used caching very effectively Prof. Pier Luca Lanzi
  • 10. Nearest Neighbor Classifiers 10 Basic idea: If it walks like a duck, quacks like a duck, then it’s probably a duck Compute Test Distance Record Training Choose k of the Records “nearest” records Prof. Pier Luca Lanzi
  • 11. Nearest-Neighbor Classifiers 11 Requires three things Unknown record The set of stored records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) Prof. Pier Luca Lanzi
  • 12. Definition of Nearest Neighbor 12 X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x Prof. Pier Luca Lanzi
  • 13. 1 nearest-neighbor 13 Voronoi Diagram Prof. Pier Luca Lanzi
  • 14. What distance? 14 To compute the similarity between two points, use for instance the Euclidean distance Determine the class from nearest neighbor list take the majority vote of class labels among the k-nearest neighbors Weigh the vote according to distance, e.g., the weight factor, w = 1/d2 Taking the square root is not required when comparing distances Another popular metric is city-block (Manhattan) metric, distance is the sum of absolute differences Prof. Pier Luca Lanzi
  • 15. Normalization and other issues 15 Different attributes are measured on different scales Attributes need to be normalized: vi − min vi vi − Avg ( vi ) ai = ai = max vi − min vi StDev ( vi ) vi : the actual value of attribute i Nominal attributes: distance either 0 or 1 Common policy for missing values: assumed to be maximally distant (given normalized attributes) Prof. Pier Luca Lanzi
  • 16. Nearest Neighbor Classification… 16 Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes Prof. Pier Luca Lanzi
  • 17. Nearest Neighbor Classification: 17 Scaling Issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M Prof. Pier Luca Lanzi
  • 18. Nearest Neighbor Classification 18 Problem with Euclidean measure: High dimensional data, curse of dimensionality Can produce counter-intuitive results 111111111110 100000000000 vs 011111111111 000000000001 d = 1.4142 d = 1.4142 • Solution: Normalize the vectors to unit length Prof. Pier Luca Lanzi
  • 19. Nearest neighbor Classification 19 k-NN classifiers are lazy learners It does not build models explicitly Unlike eager learners such as decision tree induction and rule-based systems Classifying unknown records are relatively expensive Prof. Pier Luca Lanzi
  • 20. The distance function 20 Simplest case: one numeric attribute Distance is the difference between the two attribute values involved (or a function thereof) Several numeric attributes: normally, Euclidean distance is used and attributes are normalized Nominal attributes: distance is set to 1 if values are different, 0 if they are equal Are all attributes equally important? Weighting the attributes might be necessary Prof. Pier Luca Lanzi
  • 21. Discussion of Nearest-neighbor 21 Often very accurate But slow, since simple version scans entire training data to derive a prediction Assumes all attributes are equally important, thus may need attribute selection or weights Possible remedies against noisy instances: Take a majority vote over the k nearest neighbors Removing noisy instances from dataset (difficult!) Statisticians have used k-NN since early 1950s If n → ∞ and k/n → 0, error approaches minimum Prof. Pier Luca Lanzi
  • 22. Case-Based Reasoning (CBR) 22 Also uses and lazy evaluation and analyzes similar instances But instances are not “points in a Euclidean space” Methodology Instances represented by rich symbolic descriptions (e.g., function graphs) Multiple retrieved cases may be combined Tight coupling between case retrieval, knowledge-based reasoning, and problem solving Research issues Indexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additional cases Prof. Pier Luca Lanzi
  • 23. Lazy Learning vs. Eager Learning 23 Instance-based learning is based on lazy evaluation Decision-tree and Bayesian classification are based on eager evaluation Key differences Lazy method may consider query instance xq when deciding how to generalize beyond the training data D Eager method cannot since they have already chosen global approximation when seeing the query Prof. Pier Luca Lanzi
  • 24. Lazy Learning vs. Eager Learning 24 Efficiency Lazy learning needs less time for training but more time predicting Accuracy Lazy methods effectively use a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function Eager methods must commit to a single hypothesis that covers the entire instance space Prof. Pier Luca Lanzi
  • 25. Example: PEBLS 25 PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) Works with both continuous and nominal features For nominal features, distance between two nominal values is computed using modified value difference metric Each record is assigned a weight factor Number of nearest neighbor, k = 1 Prof. Pier Luca Lanzi
  • 26. Example: PEBLS 26 Tid Refund Marital Taxable Distance between nominal attribute values: Income Cheat Status d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 No 1 Yes Single 125K d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 No 2 No Married 100K d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 No 3 No Single 70K d(Refund=Yes,Refund=No) = No 4 Yes Married 120K | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Yes 5 No Divorced 95K No 6 No Married 60K n1i n 2 i ∑ No 7 Yes Divorced 220K d (V1 ,V2 ) = − Yes 8 No Single 85K n1 n 2 i No 9 No Married 75K Yes 10 No Single 90K 10 Refund Marital Status Class Class Yes No Single Married Divorced Yes 0 3 Yes 2 0 1 No 3 4 No 2 4 1 Prof. Pier Luca Lanzi
  • 27. Example: PEBLS 27 Tid Refund Marital Taxable Income Cheat Status X Yes Single 125K No Y No Married 100K No 10 Distance between record X and record Y: d Δ ( X , Y ) = w X wY ∑ d ( X i , Yi ) 2 i =1 Number of times X is used for prediction where: wX = Number of times X predicts correctly wX ≅ 1 if X makes accurate prediction most of the time wX > 1 if X is not reliable for making predictions Prof. Pier Luca Lanzi
  • 28. Bayes Classifier 28 A probabilistic framework for solving classification problems Conditional Probability: Bayes theorem, A priori probability of C: P(C) Probability of event before evidence is seen A posteriori probability of C: P(C|A) Probability of event after evidence is seen Prof. Pier Luca Lanzi
  • 29. Naïve Bayes for classification 29 What’s the probability of the class given an instance? Evidence E = instance, represented as a tuple of attributes <e1, …, en> Event H = class value for instance We are looking for the class value with the highest probability for E I.e., we are looking for the hypothesis that has the highest probability to explain the evidence E Prof. Pier Luca Lanzi
  • 30. Naïve Bayes for classification 30 Given the hypothesis H and the example E described by n attributes, Bayes Theorem says that Naïve assumption: attributes are statistically independent Evidence splits into parts that are independent Prof. Pier Luca Lanzi
  • 31. Naïve Bayes for classification 31 Training: the class probability P(H) and the conditional probability P(ei|H) for each attribute value ei and each class value H Testing: given an instance E, the class is computed as Prof. Pier Luca Lanzi
  • 32. Bayesian (Statistical) modeling 32 “Opposite” of OneRule: use all the attributes Two assumptions Attributes are equally important Attribute are statistically independent I.e., knowing the value of one attribute says nothing about the value of another (if the class is known) Independence assumption is almost never correct! But this scheme works well in practice Prof. Pier Luca Lanzi
  • 33. Using the Probabilities for Classification 33 Outlook Temperature Humidity Windy Play Yes No Yes No Yes No Yes No Yes No Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5 Overcast 4 0 Mild 4 2 Normal 6 1 True 3 3 Rainy 3 2 Cool 3 1 Sunny 2/9 3/5 Hot 2/9 2/5 High 3/9 4/5 False 6/9 2/5 9/14 5/14 Overcast 4/9 0/5 Mild 4/9 2/5 Normal 6/9 1/5 True 3/9 3/5 Rainy 3/9 2/5 Cool 3/9 1/5 Outlook Temp. Humidity Windy Play Sunny Cool High True ? What is the value of “play”? Prof. Pier Luca Lanzi
  • 34. Using the Probabilities for Classification 34 Outlook Temp. Humidity Windy Play Sunny Cool High True ? Conversion into a probability by normalization: Prof. Pier Luca Lanzi
  • 35. The “zero-frequency problem” 35 What if an attribute value does not occur with every class value? For instance, “Humidity = high” for class “yes” Probability will be zero! A posteriori probability will also be zero! (No matter how likely the other values are!) Remedy: add 1 to the count for every attribute value-class combination (Laplace estimator) Result: probabilities will never be zero! (also: stabilizes probability estimates) Prof. Pier Luca Lanzi
  • 36. M-Estimate of Conditional Probability 36 n number of examples with class H nc number of examples with class H and attribute e The m-estimate for P(e|H) is computed as, Where m is a parameter known as the equivalent sample size p is a user specified parameter If n is zero, then P(e|H) = p, thus p is the prior probability of observing e p =1/k given k values of the attribute considered Prof. Pier Luca Lanzi
  • 37. Missing values 37 Training: instance is not included in frequency count for attribute value-class combination Testing: attribute will be omitted from calculation Example: Outlook Temp. Humidity Windy Play ? Cool High True ? Likelihood of “yes” = 3/9 × 3/9 × 3/9 × 9/14 = 0.0238 Likelihood of “no” = 1/5 × 4/5 × 3/5 × 5/14 = 0.0343 P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41% P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59% Prof. Pier Luca Lanzi
  • 38. Numeric attributes 38 Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) Karl Gauss, 1777-1855 great German mathematician Prof. Pier Luca Lanzi
  • 39. Numeric attributes 39 Usual assumption: attributes have a normal or Gaussian probability distribution (given the class) The probability density function for the normal distribution is defined by two parameters: Sample mean, Standard deviation, Then the density function f(x) is Prof. Pier Luca Lanzi
  • 40. Statistics for Weather data 40 with numeric temperature Outlook Temperature Humidity Windy Play Yes No Yes No Yes No Yes No Yes No Sunny 2 3 64, 68, 65, 71, 65, 70, 70, 85, False 6 2 9 5 Overcast 4 0 69, 70, 72, 80, 70, 75, 90, 91, True 3 3 Rainy 3 2 72, … 85, … 80, … 95, … μ =73 μ =75 μ =79 μ =86 Sunny 2/9 3/5 False 6/9 2/5 9/14 5/14 σ =6.2 σ =7.9 σ =10.2 σ =9.7 True Overcast 4/9 0/5 3/9 3/5 Rainy 3/9 2/5 Example density value: Prof. Pier Luca Lanzi
  • 41. Classifying a new day 41 A new day, Outlook Temp. Humidity Windy Play Sunny 66 90 true ? Missing values during training are not included in calculation of mean and standard deviation Likelihood of “yes” = 2/9 × 0.0340 × 0.0221 × 3/9 × 9/14 = 0.000036 Likelihood of “no” = 3/5 × 0.0291 × 0.0380 × 3/5 × 5/14 = 0.000136 P(“yes”) = 0.000036 / (0.000036 + 0. 000136) = 20.9% P(“no”) = 0.000136 / (0.000036 + 0. 000136) = 79.1% Prof. Pier Luca Lanzi
  • 42. Naïve Bayes: discussion 42 Naïve Bayes works surprisingly well (even if independence assumption is clearly violated) Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class However: adding too many redundant attributes will cause problems (e.g. identical attributes) Also, many numeric attributes are not normally distributed Prof. Pier Luca Lanzi
  • 43. Bayesian Belief Networks 43 The conditional independence assumption, makes computation possible yields optimal classifiers when satisfied but is seldom satisfied in practice, as attributes (variables) are often correlated Bayesian Belief Networks (BBN) allows us to specify which pair of attributes are conditionally independent They provide a graphical representation of probabilistic relationships among a set of random variables Prof. Pier Luca Lanzi
  • 44. Bayesian Belief Networks 44 Describe the probability distribution governing a set of variables by specifying Conditional independence assumptions that apply on subsets of the variables A set of conditional probabilities Two key elements A direct acyclic graph, encoding the dependence relationships among variables A probability table associating each node to its immediate parents node Prof. Pier Luca Lanzi
  • 45. Examples of Bayesian Belief Networks 45 a) Variable A and B are independent, but each one of them has an influence on the variable C b) A is conditionally independent of both B and D, given C c) The configuration of the typical naïve Bayes classifier Prof. Pier Luca Lanzi
  • 46. Probability Tables 46 The network topology imposes conditions regarding the variable conditional independence Each node is associated with a probability table If a node X does not have any parents, then the table contains only the prior probability P(X) If a node X has only one any parent Y, then the table contains only the conditional probability P(X|Y) If a node X has multiple parents, Y1, …, Yk the the table contains the conditional probability P(X|Y1…Yk) Prof. Pier Luca Lanzi
  • 48. Model Building 48 Prof. Pier Luca Lanzi
  • 49. Model Building 49 Let T = {X1, X2, …, Xd} denote a total order of the variables for j = 1 to d do Let XT(j) denote the jth highest order variable in T Let π(XT(j)) = {XT(1) , XT(j-1)} denote the variables preceding XT(j) Remove the variables from π(XT(j)) that do not affect XT(j) Create an arc between XT(j) and the remaining variables in π(XT(j)) done Guarantees a topology that does not contain cycles Different orderings generate different networks In principles d! possible networks to test Prof. Pier Luca Lanzi
  • 50. Bayesian Belief Networks 50 evidence, observed unobserved to be predicted In general the inference is NP-complete but there are approximating methods, e.g. Monte-Carlo Prof. Pier Luca Lanzi
  • 51. Bayesian Belief Networks 51 Bayesian belief network allows a subset of the variables conditionally independent A graphical model of causal relationships Several cases of learning Bayesian belief networks Given both network structure and all the variables is easy Given network structure but only some variables When the network structure is not known in advance Prof. Pier Luca Lanzi
  • 52. Summary of Bayesian Belief Networks 52 BBN provides an approach for capturing prior knowledge of a particular domain using a graphical model Building the network is time consuming, but adding variables to a network is straightforward They can encode causal dependencies among variables They are well suited to dealing with incomplete data They are robust to overfitting Prof. Pier Luca Lanzi