SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Introduction to Pattern
                                  Recognition
                                Luís Gustavo Martins - lmartins@porto.ucp.pt
                                            EA-UCP, Porto, Portugal
                                     http://artes.ucp.pt ; http://citar.ucp.pt




   2011                                           Luís Gustavo Martins           1

Wednesday, March 16, 2011                                                            1
Human Perception

                • Humans have developed highly sophisticated skills for
                  sensing their environment and taking actions according to
                  what they observe, e.g.,
                     –      Recognizing a face.
                     –      Understanding spoken words.
                     –      Reading handwriting.
                     –      Distinguishing fresh food from its smell.
                     –      ...




         2011                              Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                 2
Human and Machine Perception
       • We are often influenced by the knowledge of how patterns
         are modeled and recognized in nature when we develop
         pattern recognition algorithms.

       • Research on machine perception also helps us gain deeper
         understanding and appreciation for pattern recognition
         systems in nature.

       • Yet, we also apply many techniques that are purely numerical
         and do not have any correspondence in natural systems.


         2011               Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                  3
Pattern Recognition (PR)

                • Pattern Recognition is the study of how machines can:
                     – observe the environment,
                     – learn to distinguish patterns of interest,
                     – make sound and reasonable decisions about the categories of the
                       patterns.




         2011                        Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                4
Pattern Recognition (PR)
                • What is a Pattern?
                     – is an abstraction, represented by a set of measurements
                       describing a “physical” object
                • Many types of patterns exist:
                     – visual, temporal, sonic, logical, ...




                                                                        Pattern Recognition Applications




         2011                          Luís Gustavo Martins - lmartins@porto.ucp.ptFigure 3:           Fingerprint recognition.


                                                                       CS 551, Spring 2011   c
                                                                                             2011, Selim Aksoy (Bilkent University)   8 / 40
Wednesday, March 16, 2011                                                                                                                       5
Pattern Recognition (PR)

                • What is a Pattern Class (or category)?
                     – is a set of patterns sharing common attributes
                     – a collection of “similar”, not necessarily identical, objects
                     – During recognition, given objects are assigned to a prescribed class




         2011                         Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                     6
Pattern Recognition (PR)

                • No single theory of Pattern Recognition can possibly
                  cope with such a broad range of problems...
                • However, there are several standard models,
                  including:
                     – Statistical or fuzzy pattern recognition (see Fukunaga)
                     – Syntactic or structural pattern recognition (see Schalkoff)
                     – Knowledge-based pattern recognition (see Stefik)




         2011                       Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                            7
Pattern Recognition

                • Two phase Process
                     1.Training/Learning
                            • Learning is hard and time consuming
                            • System must be exposed to several examples of each class
                            • Creates a “model” for each class
                            • Once learned, it becomes natural
                     2.Detecting/Classifying




         2011                               Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                  8
Pattern Recognition
       • How can a machine learn the rule from data?
            – Supervised learning: a teacher provides a category label
              or cost for each pattern in the training set.
                  ➡Classification
            – Unsupervised learning: the system forms clusters or
              natural groupings of the input patterns (based on some
              similarity criteria).
                  ➡Clustering
            • Reinforcement learning: no desired category is given but
              the teacher provides feedback to the system such as the
              decision is right or wrong.

         2011                  Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                     9
Pattern Recognition

                  • Supervised Training/Learning
                               – a “teacher” provides labeled training sets, used to train a classifier


             1                Learn about Shape                                                       1   Learn about Color


          Triangles
                                                          2
                                                     Clas                                                                Blue Objects
                                                         sify
                                                                           ?
               Training Set




                                                                                                                              Training Set
                                                  It’s a Triangle!
                                                                                              2
                                                                                         Clas
                                                                                             sify
                                                                                       It’s Yellow!
           Circles                                                                                                       Yellow Objects




         2011                                       Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                                                    10
Pattern Recognition
                • Unsupervised Training/Learning
                     – No labeled training sets are provided
                     – System applies a specified clustering/grouping criteria to unlabeled dataset
                     – Clusters/groups together “most similar” objects (according to given criteria)



                                                                                 Unlabeled Training set




                                                  1     Clustering Criteria = some similarity measure




                                                                        ?    2




         2011                                 Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                 11
Pattern Recognition Process

           • Data acquisition and sensing:
                – Measurements of physical variables.                                      Decision


                – Important issues: bandwidth, resolution , etc.
                                                                                        Post- processing
           • Pre-processing:
                – Removal of noise in data.                                              Classification

                – Isolation of patterns of interest from the background.
           • Feature extraction:                                                       Feature Extraction


                – Finding a new representation in terms of features.
                                                                                         Segmentation
           • Classification
                – Using features and learned models to assign a pattern to a
                                                                                            Sensing
                  category.
           • Post-processing                                                                 input
                – Evaluation of confidence in decisions.



         2011                           Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                   12
Features
       • Features are properties of an object:
            –   Ideally representative of a specific type (i.e. class) of objects
            –   Compact (memory efficient)
            –   Computationally simple (CPU efficient)
            –   Perceptual relevant (if trying to implement a human inspired classifier)
       • = Should pave the way to a good discrimination of different
         classes of objects!




         2011                            Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                  13
Features
       • Take a group of graphical objects
            – Possible features:
                  •   Shape
                  •   Color
                  •   Size
                  •   ...
            – Allows to group them into different classes:




                            SHAPE                         SIZE                     COLOR


         2011                       Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                  14
Feature Vectors
       • Usually a single object can be represented using
         several features, e.g.
            –   x1 = shape (e.g. nr of sides)
            –   x2 = size (e.g. some numeric value)
            –   x3 = color (e.g. rgb values)
            –   ...
            –   xd = some other (numeric) feature.

       • X becomes a feature vector
            – x is a point in a d-dimensional feature space.




         2011                              Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                 15
Feature Vectors

                • Example of a 2D Feature Space
                     – x1 = shape (e.g. nr of sides)
                     – x2 = size (e.g. some numeric value)



                                          x2




                                                                                           x1


         2011                               Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                       16
The Classical Model for PR
       1. A Feature Extractor extracts features from raw data (e.g. audio, image,
          weather data, etc)
       2. A Classifier receives X and assigns it to one of c categories, Class 1, Class
          2, ..., Class c (i.e. labels the raw data).




         2011                     Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                 17
The Classical Model for PR
       • Example: classify graphic objects according to
         their shape
            – Feature extracted:
                  • nr. of sides (x1)
                                                                                  [x1]
            – Classifier:                                                                Triangle
               • 0 sides = circle
               • 3 sides = triangle
               • (4 sides = rectangle)
            – How does the classifier know that a circle has no sides and that a triangle has
              3 sides?!




         2011                           Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                           18
Spring 2011




                                                                                                                                                                              two kinds of fish:
                                                                                                                                                                              Assume that we have only
                                                                                                                                                                                                         according to species.
                                                                                                                                                                                                         fish on a conveyor belt
                                                                                                                                                                                                         Problem: Sorting incoming
                                                                                                                                                                  

                                                                                                                                                                       
                                                                                                                                                                  salmon.
                                                                                                                                                                  sea bass,
               A Case Study: Fish Classification




                                                                                      2011, Selim Aksoy (Bilkent University)
                                                                                      c


                                                                                                                                camera.
                                                                                                                                Figure 12: Picture taken from a
                • Problem:
                     – sort incoming fish on a conveyor belt
                       according to species
                     –Assume only two classes exist:




                                                                                                   17 / 40
                            • Sea Bass and Salmon




                                                                   Salmon
                                           Sea-bass




         2011                          Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                                                                                                                                            19
A Case Study: Fish Classification
       • What kind of information can distinguish one species from the
         other?
      – length, width, weight, number and shape of fins, tail shape, etc.

       • What can cause problems during sensing?
      – lighting conditions, position of fish on the conveyor belt, camera noise, etc.

       • What are the steps in the process?
                  1.Capture image.                                                              Pre-processing

                  2.Isolate fish
                  3.Take measurements                                                      Feature Extraction



                  4.Make decision                                                               Classification




                                                                                     Sea Bass                    Salmon




         2011                         Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                                 20
A Case Study: Fish Classification
                • Selecting Features
                     – Assume a fisherman told us that a sea bass is generally longer than a salmon.
                     – We can use length as a feature and decide between sea bass and salmon
                            according to a threshold on length.
                     –            An Example: Selecting
                            How can we choose this threshold?                            Features



                                                                                                                   Even though “sea
           Histograms of the
                                                                                                                   bass” is longer than
           length feature for
                                                                                                                   “salmon” on the
           two types of fish in
                                                                                                                   average, there are
           training samples.
                                                                                                                   many examples of
           How can we choose
                                                                                                                   fish where this
           the threshold l∗ to
                                                                                                                   observation does
           make a reliable
                                                                                                                   not hold...
           decision?



                                   Figure 13: Histograms of the length feature for two types of fish in training
                                   samples. How can we choose the threshold l∗ to make a reliable decision?

         2011                                          Luís Gustavo Martins - lmartins@porto.ucp.pt
                                 CS 551, Spring 2011                c
                                                                    2011, Selim Aksoy (Bilkent University)   20 / 40


Wednesday, March 16, 2011                                                                                                                 21
A Case Study: Fish Classification
                • Selecting Features
                     – Let’s try another feature and see if we get better discrimination
                        ➡Average Lightness of the fish scales
                         An Example: Selecting Features



                                                                                                            It looks easier to
       Histograms of the                                                                                    choose the
       lightness feature for                                                                                threshold x∗ but
       two types of fish in                                                                                 we still cannot
       training samples.                                                                                    make a perfect
                                                                                                            decision.




                              Figure 14: Histograms of the lightness feature for two types of fish in training
                              samples. It looks easier to choose the threshold x∗ but we still cannot make a
                              perfect decision.
         2011                                     Luís Gustavo Martins - lmartins@porto.ucp.pt
                            CS 551, Spring 2011              c
                                                             2011, Selim Aksoy (Bilkent University)   22 / 40
Wednesday, March 16, 2011                                                                                                        22
A Case Study: Fish Classification
                • Multiple Features
                     – Single features might not yield the best performance.
                     – To improve recognition, we might have to use more than one feature at a time.
                     – Combinations of features might yield better performance.
                     – Assume we also observed that sea bass are typically wider than salmon.

                                                                                          Scatter plot of
      ! x1  x1 : lightness                                                               lightness and width
                                                                                          features for training
      # x $ x : width                                                                     samples. We can
      % 2 2                                                                              draw a decision
                                                                                          boundary to divide
                                                                                          the feature space
         Each fish image is
                                                                                          into two regions.
         now represented by
                                                                                          Does it look better
         a point in this 2D
                                                                                          than using only
         feature space
                                                                                          lightness?




         2011                           Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                         23
A Case Study: Fish Classification
        An Example: Decision Boundaries
       • Designing a Classifier
               • Can we do better with another decision rule?
               Can we do better with another decision rule?
               • More complex models result in more complex boundaries.
                  More complex models result in more complex boundaries.

     We may
                                                                                    DANGER OF
     distinguish
                                                                                    OVER
     training samples
                                                                                    FITTING!!
     perfectly but how
     can we predict
                                                                                    CLASSIFIER
     how well we can
                                                                                    WILL FAIL TO
     generalize to
                                                                                    GENERALIZE
     unknown
                                                                                    TO NEW
     samples?
                                                                                    DATA...

                   Figure 16: We may distinguish training samples perfectly but how can
                   we predict how well we can generalize to unknown samples?
         2011                        Luís Gustavo Martins - lmartins@porto.ucp.pt
       CS 551, Spring 2011             c
                                       2011, Selim Aksoy (Bilkent University)      27 / 40
Wednesday, March 16, 2011                                                                          24
A Case Study: Fish Classification
          An Example: Decision Boundaries
       • Designing a Classifier
                •   How can we manage the tradeoff betweenbetween complexityrules and
                     How can we manage the tradeoff complexity of decision of
                     their performance to unknown samples?
                     decision rules and their performance to unknown samples?




       Different criteria
       lead to different
       decision
       boundaries




                        Figure 17: Different criteria lead to different decision boundaries.

         2011
         CS 551, Spring 2011
                                         Luís 2011, Selim Aksoylmartins@porto.ucp.pt
                                              Gustavo Martins - (Bilkent University)
                                              c                                         28 / 40

Wednesday, March 16, 2011                                                                         25
Feature Extraction
       • Designing a Feature Extractor
            • Its design is problem specific (e.g. features to extract from graphic objects may
                be quite different from sound events...)
            • The ideal feature extractor would produce the same feature vector X for all
                patterns in the same class, and different feature vectors for patterns in different
                classes.
            • In practice, different inputs to the feature extractor will always produce different
                feature vectors, but we hope that the within-class variability is small relative to
                the between-class variability.
       • Designing a good set of features is sometimes “more of an art than a
           science”...




         2011                          Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                             26
Feature Extraction

                • Multiple Features
                     – Does adding more features always improve the results?
                        • No!! So we must:
                            – Avoid unreliable features.
                            – Be careful about correlations with existing features.
                            – Be careful about measurement costs.
                            – Be careful about noise in the measurements.
                     – Is there some curse for working in very high dimensions?
                        • YES THERE IS! == CURSE OF DIMENSIONALITY
                              ➡thumb rule: n = d(d-1)/2 n = nr of examples in training dataset
                                                           d = nr of features




         2011                              Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                         27
Feature Extraction
       • Problem: Inadequate Features
            – features simply do not contain the information needed to separate the classes, it
              doesn't matter how much effort you put into designing the classifier.
            – Solution: go back and design better features.




                                 Good features                     Bad features




         2011                      Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                         28
Feature Extraction
              • Problem: Correlated Features
                    – Often happens that two features that were meant to measure different characteristics
                      are influenced by some common mechanism and tend to vary together.
                            • E.g. the perimeter and the maximum width of a figure will both vary with scale; larger figures will have
                              both larger perimeters and larger maximum widths.
                    – This degrades the performance of a classifier based on Euclidean distance to a
                      template.
                            • A pattern at the extreme of one class can be closer to the template for another class than to its own
                              template. A similar problem occurs if features are badly scaled, for example, by measuring one feature in
                              microns and another in kilometers.
                    – Solution: (Use other metrics, e.g. Mahalanobis...) or extract features known to be uncorrelated!




         2011                                      Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                                                 29
Designing a Classifier
       • Model selection:
            –   Domain dependence and prior information.
            –   Definition of design criteria.
            –   Parametric vs. non-parametric models.
            –   Handling of missing features.
            – Computational complexity.
            – Types of models: templates, decision-theoretic or statistical, syntactic or
              structural, neural, and hybrid.
            – How can we know how close we are to the true model underlying the patterns?




         2011                      Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                   30
Designing a Classifier
       • Designing a Classifier
                  • How can we manage the tradeoff between complexity of decision rules and
                    their performance to unknown samples?




      Different criteria
      lead to different
      decision
      boundaries




         2011                        Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                     31
Designing a Classifier
                • Problem: Curved Boundaries
                     – linear boundaries produced by a minimum-Euclidean-distance
                       classifier may not be flexible enough.
                            • For example, if x1 is the perimeter and x2 is the area of a figure, x1 will grow linearly with
                              scale, while x2 will grow quadratically. This will warp the feature space and prevent a linear
                              discriminant function from performing well.
                     – Solutions:
                            • Redesign the feature set (e.g., let x2 be the square root of the area)
                            • Try using Mahalanobis distance, which can produce quadratic decision boundaries
                            • Try using a neural network (beyond the scope of these notes; see Haykin)




         2011                                    Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                                        32
Designing a Classifier
       • Problem: Subclasses in the dataset
            – frequently happens that the classes defined by the end user are
              not the natural classes...
            – Solution: use CLUSTERING.




         2011                   Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                       33
Designing a Classifier
       • Problem: Complex Feature Space
            – Solution: use different type of Classifier...




         2011                    Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                       34
Simple Classifiers
       • Minimum-distance Classifiers
            – based on some specified “metric” ||x-m||
            – e.g. Template Matching




         2011                 Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                    35
Simple Classifiers

       • Template Matching
                                                                       TEMPLATES                NOISY EXAMPLES

            – To classify one of the noisy examples, simply compare it to the two templates. This can
              be done in a couple of equivalent ways:
                  1. Count the number of agreements. Pick the class that has the maximum number of agreements. This is a maximum
                     correlation approach.
                  2. Count the number of disagreements. Pick the class with the minimum number of disagreements. This is a minimum
                     error approach.

            • Works well when the variations within a class are due to additive noise”, and there are no
                other distortions of the characters -- translation, rotation, shearing, warping, expansion,
                contraction or occlusion.




         2011                                 Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                                            36
Simple Classifiers
       • Metrics
            – different ways of measuring distance:
                  • Euclidean metric:
                        – || u || = sqrt( u12 + u22 + ... + ud2 )
                  • Manhattan (or taxicab) metric:
                     – || u || = |u1| + |u2| + ... + |ud|
                  • Contours of constant...
                     – ... Euclidean distance are circles (or spheres)
                     – ... Manhattan distance are squares (or boxes)
                     – ... Mahalanobis distance are ellipses (or ellipsoids)




         2011                              Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                 37
4.4. TRAINING FEEDFORWARD NETWORKS BY BACKPROPAGATION53

                            this threshold component, but we have chosen here to use the familiar X,W


                     Classifiers: Neural Networks
                            notation, assuming that these vectors are “augmented” as appropriate.) We
                            denote the weighted sum input to the i-th threshold unit in the j-th layer by
                             (j)            (j)            (j)
                            si . (That is, si = X(j−1) •Wi .) The number of TLUs in the j-th layer is
                                                      (j)                   (j)
                            given by mj . The vector Wi has components wl,i for l = 1, . . . , m(j−1) + 1.


                                    First Layer                    j-th Layer           (k-1)-th Layer     k-th Layer

                             X(0)                  X(1)                          X(j)                 X(k-1)




                                                                                                                 W(k)
                                          ...                          ...                 ...
                            ...         Wi(1)
                                                                                                                           f
                                                                     Wi(j)              Wi(k-1)          wl(k)

                            ...                           wli(j)                                                    s(k)
                                           si(1)                         si(j)              si(k-1)
                                           ...                         ...                 ...


                                      m1 TLUs                        mj TLUs              m(k-1) TLUs



                                                           Figure 4.17: A k-layer Network
                                     http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf

                            4.4.2    The Backpropagation Method
                            A gradient descent method, similar to that used in the Widrow Hoff method,
                            has been proposed by various authors for training a multi-layer, feedforward
                            network. As before, we define an error function on the final output of the
         2011               network and we adjustMartins - lmartins@porto.ucp.pt as to minimize the error.
                                      Luís Gustavo each weight in the network so
                            If we have a desired response, di , for the i-th input vector, Xi , in the training
Wednesday, March 16, 2011   set, Ξ, we can compute the squared error over the entire training set to be:                       38
Gaussian Modeling
                                                                                                                                  p(x ,x )
                                                                                                                                     1 2
                                                                                                                                                            x
                                                                                                                                                             2




                                                                                                     1
                                                                                                 0.75
                                                                                                    75
                                                                                                   0.5
                                                                                                    .5                                                       5
                                                                                                  0.25
                                                                                                     25
                                                                                                       0
                                                                                                                                                        0
 66                                                CHAPTER 5. STATISTICAL LEARNING
                                                                       -5
                                                                                                                         0                                            x
                                                                                                                                                   -5                  1
                                                                                                                                           5


                                                    p(x ,x )
                                                       1 2
                                                                         x                                                             x
                                                                          2                                                             2

                                                                                                                6                  6
                                                                                                               4                   4
                                1
                            0.75
                               75                                                                               2                  2
                              0.5
                               .5                                            5
                             0.25
                                25                                                                             0                                                 x
                                  0                                                                                                            2   4    6
                                                                                                                                                                  1
                                                                     0                                         -2
                                      -5
                                                                                                               -4
                                               0                                         x
                                                                -5                           1
                                                        5                                                      -6

                                                                                                                    -6   -4   -2       0       2   4    6




                                                       x
                                                         2
                                                    http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf
                                                                     Figure 5.1: The Two-Dimensional Gaussian Distribution
                                           6        6
                                           4        4
                                           2       Decide category 1 iff:
                                                    2
         2011                                       Luís Gustavo Martins - lmartins@porto.ucp.pt
                                           0                                     x
                                                            2   4    6
                                                                                     1
                                       -2                                                                  1                      −1/2(X−M2 )t Σ2 (X−M2 )
                                                                                                                                                             −1
Wednesday, March 16, 2011                                                                                                     e                                            39
x
                                                                                                                                                     2




                             Gaussian Mixture Models                                      0.75
                                                                                              1
                                                                                             75
                                                                                            0.5
                                                                                             .5
                                                                                                                                                             10
                                                                                           0.25
                                                                                              25                                                         5
                                                                                                0


       • Use multiple Gaussians to model the data
1. USING STATISTICAL DECISION THEORY                                                                                         67
                                                                                                    -5                                           0
                                                                                                               0
                                                                                                                                                                  x
                                                                                                                         5                  -5                        1
                                                                                                                                      10
                                               p(x ,x )
                                                  1 2
                                                                    x                           10
                                                                        2
                                                                                               7.5

                                                                                                    5


                                1                                                              2.5
                            0.75
                               75                                               10
                              0.5
                               .5                                                                   0
                             0.25
                                25                                          5
                                  0                                                          -2.5
                                      -5                            0
                                           0                                                    -5
                                                                                     x
                                                 5             -5                     1
                                                                                                         -5   -2.5   0   2.5      5        7.5   10
                                                          10



                                  10                                            Figure 5.2: The Sum of Two Gaussian Distributions
                                 7.5
                                                     http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf
                                      5


                                 2.5


                                      0         Decide category 1 iff:
         2011                  -2.5                  Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011
                                  -5
                                                                    (X − M1 )t Σ−1 (X − M1 )  (X − M2 )t Σ−1 (X − M2 ) + B
                                                                                1                          2                                                              40
attribute values along each dimension is approximately the same
                                                                                                         
                                                                                                             n
                                                      the distance between the two vectors would be              a2 (x1j −
                                                                                                             j=1 j


                                         Classifiers: kNN
                                                      aj is the scale factor for dimension j.
                                                          An example of a nearest-neighbor decision problem is shown
                                                      the figure the class of a training pattern is indicated by the num


                • k-Nearest Neighbours                                                 class of training pattern                          training pattern



                  Classifier
                                                                                                                             3
                                                                                                                     3        3
                                                                                                         2       3       3                     2
                                                                                                    2                             2
                     – Lazy Classifier
                                                                                                         2                                  2
                                                                                                                                  2
                                                                                                     2       2
                                                                                           1                                               1
                            • no training is actually performed                            1 1                   3       3            1     1

                              (hence, lazy ;-))
                                                                                       1                         3                         1
                                                                                               1                         3


                     – An example of Instance Based                                                                      X (a pattern to be classified)
                       Learning                                                            k=8
                                                                                                   four patterns of category 1
                                                                                                   two patterns of category 2
                                                                                                   two patterns of category 3

                                                                                                   plurality are in category 1, so
                                                                                                   decide X is in category 1



                                            http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf
                                                                                 Figure 5.3: An 8-Nearest-Neighbor Decision

                                                         Nearest-neighbor methods are memory intensive because a la
         2011                               Luís Gustavo Martins - lmartins@porto.ucp.pt
                                                     training patterns must be stored to achieve good generalization.
                                                     cost is now reasonably low, the method and its derivatives hav
Wednesday, March 16, 2011                                                                                                                                    41
Decision Trees
                                                     6.2. SUPERVISED LEARNING OF UNIVARIATE DECISION TREES

                • Learn rules from data
                                                                                            x3

                • Apply each rule at each                                              0              1


                  node                                                   0
                                                                              x2
                                                                                                  0
                                                                                                          x4
                                                                                                                   1
                                                                                   1

                • classification is at the
                                                                                              0
                                                                        0
                                                                                           x3x4                        x1
                                                                       x3x2

                  leafs of the tree
                                                                                 1                             0            1
                                                                               x3 x2
                                                                                                      1                    0
                                                                                                  x3 x4 x1             x3x4x1
                                                                          f = x3x2 + x3x4x1



                                                                Figure 6.2: A Decision Tree Implementing a DNF Function

                             http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf
                                                     6.2.1     Selecting the Type of Test
                                                 As usual, we have n features or attributes. If the attributes are binary, t
                                                 tests are simply whether the attribute’s value is 0 or 1. If the attributes a
         2011                Luís Gustavo Martinscategorical, but non-binary, the tests might be formed by dividing the attribu
                                                  - lmartins@porto.ucp.pt
                                                 values into mutually exclusive and exhaustive subsets. A decision tree with su
                                                 tests is shown in Fig. 6.4. If the attributes are numeric, the tests might invol
Wednesday, March 16, 2011                                                                                                       42
Clustering: k-means




                                 http://en.wikipedia.org/wiki/K-means_clustering




         2011                 Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                          43
Evaluating a Classifier

                •   Training Set
                     – used for training the classifier
                •   Testing Set
                     – examples not used for training
                     – avoids overfitting to the data
                     – tests generalization abilities of the trained classifiers
                •   Data sets are usually hard to obtain...
                     – Labeling examples is time and effort consuming
                     – Large labeled datasets usually not widely available
                     – Requirement of separate training and testing datasets imposes
                       higher difficulties...
                     – Use Cross-Validation techniques!

         2011                         Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                              44
Evaluating a Classifier

                • Confusion Matrix




                                   http://en.wikipedia.org/wiki/Confusion_matrix




         2011                   Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                          45
Evaluating a Classifier

                • Costs of Error
                     –We should also consider costs of different errors
                            we make in our decisions. For example, if the fish
                            packing company knows that:
                            • Customers who buy salmon will object vigorously if
                              they see sea bass in their cans.
                            • Customers who buy sea bass will not be unhappy if
                              they occasionally see some expensive salmon in their
                              cans.
                • How does this knowledge affect our decision?

         2011                          Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                             46
Pattern Recognition References
       • Introduction to Machine Learning - Draft of Incomplete Notes, by Nils J. Nilsson (http://
         robotics.stanford.edu/people/nilsson/MLBOOK.pdf)
       • Pattern Recognition general links (http://cgm.cs.mcgill.ca/~godfried/teaching/pr-web.html)
       • PR for HCI - Richard O. Duda notes (http://www.cs.princeton.edu/courses/archive/fall08/
         cos436/Duda/PR_home.htm)
       • Talal Alsubaie PR Slides (http://www.slideshare.net/t_subaie/pattern-recognition-presentation)
       • Y-H Pao, Adaptive Pattern Recognition and Neural Networks (Addison-Wesley Publishing
         Co., Reading, MA, 1989). Augments statistical procedures by including neural networks, fuzzy-
         set representations and self-organizing feature maps.
       • R. Schalkoff, Pattern Recognition: Statistical, Structural and Neural Approaches (John Wiley 
         Sons, New York, 1992). A clear presentation of the essential ideas in three important
         approaches to pattern recognition.
       • M. Stefik, Introduction to Knowledge Systems (Morgan Kaufmann, San Francisco, CA, 1995).
         Although this excellent book is not oriented towards pattern recognition per se, the
         methods it presents are the basis for knowledge-based pattern recognition.




         2011                        Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                                 47
Pattern Recognition References
       • P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach (Prentice-Hall
         International, Englewood Cliffs, NJ, 1980). A very clear presentation of the mathematical
         foundations.
       • R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis (Wiley-Interscience,
         New York, 1973). Still in print after all these years.
       • K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd Ed. (Academic Press,
         New York, 1990. A standard, widely-used textbook.
       • D. J. Hand, Discrimination and Classification (John Wiley and Sons, Chichester, UK, 1981).
         A warmly recommended introductory book.
       • S. Haykin, Neural Networks (MacMillan, NY, 1993). There are dozens of interesting
         books on neural networks. Haykin is an excellent, engineering-oriented textbook.
       • T. Masters, Advanced Algorithms for Neural Networks (Wiley, NY, 1995). Well described
         by the title, with a chapter devoted to the often overlooked issue of validation.
       • WEKA Data Mining Software in Java (http://www.cs.waikato.ac.nz/ml/weka/)




         2011                       Luís Gustavo Martins - lmartins@porto.ucp.pt

Wednesday, March 16, 2011                                                                             48

Más contenido relacionado

La actualidad más candente

Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
Rui Pedro Paiva
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Simplilearn
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Simplilearn
 

La actualidad más candente (20)

Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning Supervised learning and Unsupervised learning
Supervised learning and Unsupervised learning
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
machine learning
machine learningmachine learning
machine learning
 
Clustering
ClusteringClustering
Clustering
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 

Destacado

Face recognition using artificial neural network
Face recognition using artificial neural networkFace recognition using artificial neural network
Face recognition using artificial neural network
Sumeet Kakani
 
3 5 look for a pattern
3 5 look for a pattern3 5 look for a pattern
3 5 look for a pattern
Alex Blank
 
Shapes and patterns completed project
Shapes and patterns completed projectShapes and patterns completed project
Shapes and patterns completed project
slw87
 
Kungfu math p1 slide9 (shapes and patterns)
Kungfu math p1 slide9 (shapes and patterns)Kungfu math p1 slide9 (shapes and patterns)
Kungfu math p1 slide9 (shapes and patterns)
kungfumath
 
Telling the time power point
Telling the time power pointTelling the time power point
Telling the time power point
Ana Alba Ortiz
 
Asking and Telling the time: PowerPoint Presentation
Asking and Telling the time: PowerPoint PresentationAsking and Telling the time: PowerPoint Presentation
Asking and Telling the time: PowerPoint Presentation
A. Simoes
 
Telling the time
Telling the timeTelling the time
Telling the time
Neus Puig
 

Destacado (18)

Iris recognition seminar
Iris recognition seminarIris recognition seminar
Iris recognition seminar
 
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
Face Recognition: From Scratch To Hatch / Эдуард Тянтов (Mail.ru Group)
 
Face recognition using artificial neural network
Face recognition using artificial neural networkFace recognition using artificial neural network
Face recognition using artificial neural network
 
Shapes and Patterns Performance Task (pg 83-84) by 1 Diligence 2014
Shapes and Patterns Performance Task (pg 83-84) by 1 Diligence 2014Shapes and Patterns Performance Task (pg 83-84) by 1 Diligence 2014
Shapes and Patterns Performance Task (pg 83-84) by 1 Diligence 2014
 
3 5 look for a pattern
3 5 look for a pattern3 5 look for a pattern
3 5 look for a pattern
 
Telling time
Telling timeTelling time
Telling time
 
Shapes and patterns completed project
Shapes and patterns completed projectShapes and patterns completed project
Shapes and patterns completed project
 
Number patterns
Number patternsNumber patterns
Number patterns
 
Exponents
ExponentsExponents
Exponents
 
Kungfu math p1 slide9 (shapes and patterns)
Kungfu math p1 slide9 (shapes and patterns)Kungfu math p1 slide9 (shapes and patterns)
Kungfu math p1 slide9 (shapes and patterns)
 
Telling the time
Telling the timeTelling the time
Telling the time
 
The time
The timeThe time
The time
 
Telling the time power point
Telling the time power pointTelling the time power point
Telling the time power point
 
Asking and Telling the time: PowerPoint Presentation
Asking and Telling the time: PowerPoint PresentationAsking and Telling the time: PowerPoint Presentation
Asking and Telling the time: PowerPoint Presentation
 
Telling time
Telling timeTelling time
Telling time
 
Telling the time
Telling the timeTelling the time
Telling the time
 
Telling the time ppt
Telling the time pptTelling the time ppt
Telling the time ppt
 
Telling the time, tell the time esl efl
Telling the time, tell the time esl eflTelling the time, tell the time esl efl
Telling the time, tell the time esl efl
 

Similar a Introduction to pattern recognition

DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9
Wouter Beek
 
LOOKING AT INTERLANGUAGE.pptx
LOOKING AT INTERLANGUAGE.pptxLOOKING AT INTERLANGUAGE.pptx
LOOKING AT INTERLANGUAGE.pptx
Samanhjon
 
Orchestrating learning: survey
Orchestrating learning: surveyOrchestrating learning: survey
Orchestrating learning: survey
jtelss10
 
Sharing program chapter 7
Sharing program chapter 7Sharing program chapter 7
Sharing program chapter 7
Purwanto Ipuk
 

Similar a Introduction to pattern recognition (14)

Rapetti edmedia2010 july2010
Rapetti edmedia2010 july2010Rapetti edmedia2010 july2010
Rapetti edmedia2010 july2010
 
Cognitive Learning Theory
Cognitive Learning TheoryCognitive Learning Theory
Cognitive Learning Theory
 
DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9DynaLearn@JTEL2010_2010_6_9
DynaLearn@JTEL2010_2010_6_9
 
LOOKING AT INTERLANGUAGE.pptx
LOOKING AT INTERLANGUAGE.pptxLOOKING AT INTERLANGUAGE.pptx
LOOKING AT INTERLANGUAGE.pptx
 
Learning theories
Learning theoriesLearning theories
Learning theories
 
Learning theories
Learning theoriesLearning theories
Learning theories
 
Learning theories
Learning theoriesLearning theories
Learning theories
 
Timetable
TimetableTimetable
Timetable
 
Orchestrating learning: survey
Orchestrating learning: surveyOrchestrating learning: survey
Orchestrating learning: survey
 
Intro to lmls
Intro to lmlsIntro to lmls
Intro to lmls
 
1.6 introduction to Logomonic Learning System (LMLS)
1.6 introduction to Logomonic Learning System (LMLS)1.6 introduction to Logomonic Learning System (LMLS)
1.6 introduction to Logomonic Learning System (LMLS)
 
Cog psy L1 spring 2019
Cog psy L1 spring 2019Cog psy L1 spring 2019
Cog psy L1 spring 2019
 
Long_term_retention_recall
Long_term_retention_recallLong_term_retention_recall
Long_term_retention_recall
 
Sharing program chapter 7
Sharing program chapter 7Sharing program chapter 7
Sharing program chapter 7
 

Más de Luís Gustavo Martins

Más de Luís Gustavo Martins (17)

Creativity and Design Thinking - 2024.pdf
Creativity and Design Thinking  - 2024.pdfCreativity and Design Thinking  - 2024.pdf
Creativity and Design Thinking - 2024.pdf
 
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
Inteligência Artificial - do hype, ao mito, passando pelas oportunidades e ri...
 
ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
 ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI... ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
ANDROIDS, REPLICANTS AND BLADE RUNNERS - ARE WE ALL DEEP DREAMING OF ELECTRI...
 
Smart research? A retórica da Excelência.
Smart research? A retórica da Excelência.Smart research? A retórica da Excelência.
Smart research? A retórica da Excelência.
 
Artificial intelligence and Creativity
Artificial intelligence and CreativityArtificial intelligence and Creativity
Artificial intelligence and Creativity
 
Creativity and Design Thinking
Creativity and Design ThinkingCreativity and Design Thinking
Creativity and Design Thinking
 
The impact of Cultural Context on the Perception of Sound and Musical Languag...
The impact of Cultural Context on the Perception of Sound and Musical Languag...The impact of Cultural Context on the Perception of Sound and Musical Languag...
The impact of Cultural Context on the Perception of Sound and Musical Languag...
 
Technology Trends in Creativity and Business
Technology Trends in Creativity and BusinessTechnology Trends in Creativity and Business
Technology Trends in Creativity and Business
 
Introdução à programação em Android e iOS - Android
Introdução à programação em Android e iOS - AndroidIntrodução à programação em Android e iOS - Android
Introdução à programação em Android e iOS - Android
 
Introdução à programação em Android e iOS - iOS
Introdução à programação em Android e iOS - iOSIntrodução à programação em Android e iOS - iOS
Introdução à programação em Android e iOS - iOS
 
Introdução à programação em Android e iOS - OOP Java
Introdução à programação em Android e iOS - OOP JavaIntrodução à programação em Android e iOS - OOP Java
Introdução à programação em Android e iOS - OOP Java
 
Introdução à programação em Android e iOS - OOP em ObjC
Introdução à programação em Android e iOS - OOP em ObjCIntrodução à programação em Android e iOS - OOP em ObjC
Introdução à programação em Android e iOS - OOP em ObjC
 
Introdução à programação em Android e iOS - Conceitos fundamentais de program...
Introdução à programação em Android e iOS - Conceitos fundamentais de program...Introdução à programação em Android e iOS - Conceitos fundamentais de program...
Introdução à programação em Android e iOS - Conceitos fundamentais de program...
 
Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
Research methodology - What is a PhD?
Research methodology - What is a PhD?Research methodology - What is a PhD?
Research methodology - What is a PhD?
 
Marsyas
MarsyasMarsyas
Marsyas
 
A Computational Framework for Sound Segregation in Music Signals using Marsyas
A Computational Framework for Sound Segregation in Music Signals using MarsyasA Computational Framework for Sound Segregation in Music Signals using Marsyas
A Computational Framework for Sound Segregation in Music Signals using Marsyas
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 

Introduction to pattern recognition

  • 1. Introduction to Pattern Recognition Luís Gustavo Martins - lmartins@porto.ucp.pt EA-UCP, Porto, Portugal http://artes.ucp.pt ; http://citar.ucp.pt 2011 Luís Gustavo Martins 1 Wednesday, March 16, 2011 1
  • 2. Human Perception • Humans have developed highly sophisticated skills for sensing their environment and taking actions according to what they observe, e.g., – Recognizing a face. – Understanding spoken words. – Reading handwriting. – Distinguishing fresh food from its smell. – ... 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 2
  • 3. Human and Machine Perception • We are often influenced by the knowledge of how patterns are modeled and recognized in nature when we develop pattern recognition algorithms. • Research on machine perception also helps us gain deeper understanding and appreciation for pattern recognition systems in nature. • Yet, we also apply many techniques that are purely numerical and do not have any correspondence in natural systems. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 3
  • 4. Pattern Recognition (PR) • Pattern Recognition is the study of how machines can: – observe the environment, – learn to distinguish patterns of interest, – make sound and reasonable decisions about the categories of the patterns. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 4
  • 5. Pattern Recognition (PR) • What is a Pattern? – is an abstraction, represented by a set of measurements describing a “physical” object • Many types of patterns exist: – visual, temporal, sonic, logical, ... Pattern Recognition Applications 2011 Luís Gustavo Martins - lmartins@porto.ucp.ptFigure 3: Fingerprint recognition. CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 8 / 40 Wednesday, March 16, 2011 5
  • 6. Pattern Recognition (PR) • What is a Pattern Class (or category)? – is a set of patterns sharing common attributes – a collection of “similar”, not necessarily identical, objects – During recognition, given objects are assigned to a prescribed class 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 6
  • 7. Pattern Recognition (PR) • No single theory of Pattern Recognition can possibly cope with such a broad range of problems... • However, there are several standard models, including: – Statistical or fuzzy pattern recognition (see Fukunaga) – Syntactic or structural pattern recognition (see Schalkoff) – Knowledge-based pattern recognition (see Stefik) 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 7
  • 8. Pattern Recognition • Two phase Process 1.Training/Learning • Learning is hard and time consuming • System must be exposed to several examples of each class • Creates a “model” for each class • Once learned, it becomes natural 2.Detecting/Classifying 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 8
  • 9. Pattern Recognition • How can a machine learn the rule from data? – Supervised learning: a teacher provides a category label or cost for each pattern in the training set. ➡Classification – Unsupervised learning: the system forms clusters or natural groupings of the input patterns (based on some similarity criteria). ➡Clustering • Reinforcement learning: no desired category is given but the teacher provides feedback to the system such as the decision is right or wrong. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 9
  • 10. Pattern Recognition • Supervised Training/Learning – a “teacher” provides labeled training sets, used to train a classifier 1 Learn about Shape 1 Learn about Color Triangles 2 Clas Blue Objects sify ? Training Set Training Set It’s a Triangle! 2 Clas sify It’s Yellow! Circles Yellow Objects 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 10
  • 11. Pattern Recognition • Unsupervised Training/Learning – No labeled training sets are provided – System applies a specified clustering/grouping criteria to unlabeled dataset – Clusters/groups together “most similar” objects (according to given criteria) Unlabeled Training set 1 Clustering Criteria = some similarity measure ? 2 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 11
  • 12. Pattern Recognition Process • Data acquisition and sensing: – Measurements of physical variables. Decision – Important issues: bandwidth, resolution , etc. Post- processing • Pre-processing: – Removal of noise in data. Classification – Isolation of patterns of interest from the background. • Feature extraction: Feature Extraction – Finding a new representation in terms of features. Segmentation • Classification – Using features and learned models to assign a pattern to a Sensing category. • Post-processing input – Evaluation of confidence in decisions. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 12
  • 13. Features • Features are properties of an object: – Ideally representative of a specific type (i.e. class) of objects – Compact (memory efficient) – Computationally simple (CPU efficient) – Perceptual relevant (if trying to implement a human inspired classifier) • = Should pave the way to a good discrimination of different classes of objects! 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 13
  • 14. Features • Take a group of graphical objects – Possible features: • Shape • Color • Size • ... – Allows to group them into different classes: SHAPE SIZE COLOR 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 14
  • 15. Feature Vectors • Usually a single object can be represented using several features, e.g. – x1 = shape (e.g. nr of sides) – x2 = size (e.g. some numeric value) – x3 = color (e.g. rgb values) – ... – xd = some other (numeric) feature. • X becomes a feature vector – x is a point in a d-dimensional feature space. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 15
  • 16. Feature Vectors • Example of a 2D Feature Space – x1 = shape (e.g. nr of sides) – x2 = size (e.g. some numeric value) x2 x1 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 16
  • 17. The Classical Model for PR 1. A Feature Extractor extracts features from raw data (e.g. audio, image, weather data, etc) 2. A Classifier receives X and assigns it to one of c categories, Class 1, Class 2, ..., Class c (i.e. labels the raw data). 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 17
  • 18. The Classical Model for PR • Example: classify graphic objects according to their shape – Feature extracted: • nr. of sides (x1) [x1] – Classifier: Triangle • 0 sides = circle • 3 sides = triangle • (4 sides = rectangle) – How does the classifier know that a circle has no sides and that a triangle has 3 sides?! 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 18
  • 19. Spring 2011 two kinds of fish: Assume that we have only according to species. fish on a conveyor belt Problem: Sorting incoming salmon. sea bass, A Case Study: Fish Classification 2011, Selim Aksoy (Bilkent University) c camera. Figure 12: Picture taken from a • Problem: – sort incoming fish on a conveyor belt according to species –Assume only two classes exist: 17 / 40 • Sea Bass and Salmon Salmon Sea-bass 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 19
  • 20. A Case Study: Fish Classification • What kind of information can distinguish one species from the other? – length, width, weight, number and shape of fins, tail shape, etc. • What can cause problems during sensing? – lighting conditions, position of fish on the conveyor belt, camera noise, etc. • What are the steps in the process? 1.Capture image. Pre-processing 2.Isolate fish 3.Take measurements Feature Extraction 4.Make decision Classification Sea Bass Salmon 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 20
  • 21. A Case Study: Fish Classification • Selecting Features – Assume a fisherman told us that a sea bass is generally longer than a salmon. – We can use length as a feature and decide between sea bass and salmon according to a threshold on length. – An Example: Selecting How can we choose this threshold? Features Even though “sea Histograms of the bass” is longer than length feature for “salmon” on the two types of fish in average, there are training samples. many examples of How can we choose fish where this the threshold l∗ to observation does make a reliable not hold... decision? Figure 13: Histograms of the length feature for two types of fish in training samples. How can we choose the threshold l∗ to make a reliable decision? 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 20 / 40 Wednesday, March 16, 2011 21
  • 22. A Case Study: Fish Classification • Selecting Features – Let’s try another feature and see if we get better discrimination ➡Average Lightness of the fish scales An Example: Selecting Features It looks easier to Histograms of the choose the lightness feature for threshold x∗ but two types of fish in we still cannot training samples. make a perfect decision. Figure 14: Histograms of the lightness feature for two types of fish in training samples. It looks easier to choose the threshold x∗ but we still cannot make a perfect decision. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 22 / 40 Wednesday, March 16, 2011 22
  • 23. A Case Study: Fish Classification • Multiple Features – Single features might not yield the best performance. – To improve recognition, we might have to use more than one feature at a time. – Combinations of features might yield better performance. – Assume we also observed that sea bass are typically wider than salmon. Scatter plot of ! x1 x1 : lightness lightness and width features for training # x $ x : width samples. We can % 2 2 draw a decision boundary to divide the feature space Each fish image is into two regions. now represented by Does it look better a point in this 2D than using only feature space lightness? 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 23
  • 24. A Case Study: Fish Classification An Example: Decision Boundaries • Designing a Classifier • Can we do better with another decision rule? Can we do better with another decision rule? • More complex models result in more complex boundaries. More complex models result in more complex boundaries. We may DANGER OF distinguish OVER training samples FITTING!! perfectly but how can we predict CLASSIFIER how well we can WILL FAIL TO generalize to GENERALIZE unknown TO NEW samples? DATA... Figure 16: We may distinguish training samples perfectly but how can we predict how well we can generalize to unknown samples? 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt CS 551, Spring 2011 c 2011, Selim Aksoy (Bilkent University) 27 / 40 Wednesday, March 16, 2011 24
  • 25. A Case Study: Fish Classification An Example: Decision Boundaries • Designing a Classifier • How can we manage the tradeoff betweenbetween complexityrules and How can we manage the tradeoff complexity of decision of their performance to unknown samples? decision rules and their performance to unknown samples? Different criteria lead to different decision boundaries Figure 17: Different criteria lead to different decision boundaries. 2011 CS 551, Spring 2011 Luís 2011, Selim Aksoylmartins@porto.ucp.pt Gustavo Martins - (Bilkent University) c 28 / 40 Wednesday, March 16, 2011 25
  • 26. Feature Extraction • Designing a Feature Extractor • Its design is problem specific (e.g. features to extract from graphic objects may be quite different from sound events...) • The ideal feature extractor would produce the same feature vector X for all patterns in the same class, and different feature vectors for patterns in different classes. • In practice, different inputs to the feature extractor will always produce different feature vectors, but we hope that the within-class variability is small relative to the between-class variability. • Designing a good set of features is sometimes “more of an art than a science”... 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 26
  • 27. Feature Extraction • Multiple Features – Does adding more features always improve the results? • No!! So we must: – Avoid unreliable features. – Be careful about correlations with existing features. – Be careful about measurement costs. – Be careful about noise in the measurements. – Is there some curse for working in very high dimensions? • YES THERE IS! == CURSE OF DIMENSIONALITY ➡thumb rule: n = d(d-1)/2 n = nr of examples in training dataset d = nr of features 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 27
  • 28. Feature Extraction • Problem: Inadequate Features – features simply do not contain the information needed to separate the classes, it doesn't matter how much effort you put into designing the classifier. – Solution: go back and design better features. Good features Bad features 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 28
  • 29. Feature Extraction • Problem: Correlated Features – Often happens that two features that were meant to measure different characteristics are influenced by some common mechanism and tend to vary together. • E.g. the perimeter and the maximum width of a figure will both vary with scale; larger figures will have both larger perimeters and larger maximum widths. – This degrades the performance of a classifier based on Euclidean distance to a template. • A pattern at the extreme of one class can be closer to the template for another class than to its own template. A similar problem occurs if features are badly scaled, for example, by measuring one feature in microns and another in kilometers. – Solution: (Use other metrics, e.g. Mahalanobis...) or extract features known to be uncorrelated! 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 29
  • 30. Designing a Classifier • Model selection: – Domain dependence and prior information. – Definition of design criteria. – Parametric vs. non-parametric models. – Handling of missing features. – Computational complexity. – Types of models: templates, decision-theoretic or statistical, syntactic or structural, neural, and hybrid. – How can we know how close we are to the true model underlying the patterns? 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 30
  • 31. Designing a Classifier • Designing a Classifier • How can we manage the tradeoff between complexity of decision rules and their performance to unknown samples? Different criteria lead to different decision boundaries 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 31
  • 32. Designing a Classifier • Problem: Curved Boundaries – linear boundaries produced by a minimum-Euclidean-distance classifier may not be flexible enough. • For example, if x1 is the perimeter and x2 is the area of a figure, x1 will grow linearly with scale, while x2 will grow quadratically. This will warp the feature space and prevent a linear discriminant function from performing well. – Solutions: • Redesign the feature set (e.g., let x2 be the square root of the area) • Try using Mahalanobis distance, which can produce quadratic decision boundaries • Try using a neural network (beyond the scope of these notes; see Haykin) 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 32
  • 33. Designing a Classifier • Problem: Subclasses in the dataset – frequently happens that the classes defined by the end user are not the natural classes... – Solution: use CLUSTERING. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 33
  • 34. Designing a Classifier • Problem: Complex Feature Space – Solution: use different type of Classifier... 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 34
  • 35. Simple Classifiers • Minimum-distance Classifiers – based on some specified “metric” ||x-m|| – e.g. Template Matching 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 35
  • 36. Simple Classifiers • Template Matching TEMPLATES NOISY EXAMPLES – To classify one of the noisy examples, simply compare it to the two templates. This can be done in a couple of equivalent ways: 1. Count the number of agreements. Pick the class that has the maximum number of agreements. This is a maximum correlation approach. 2. Count the number of disagreements. Pick the class with the minimum number of disagreements. This is a minimum error approach. • Works well when the variations within a class are due to additive noise”, and there are no other distortions of the characters -- translation, rotation, shearing, warping, expansion, contraction or occlusion. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 36
  • 37. Simple Classifiers • Metrics – different ways of measuring distance: • Euclidean metric: – || u || = sqrt( u12 + u22 + ... + ud2 ) • Manhattan (or taxicab) metric: – || u || = |u1| + |u2| + ... + |ud| • Contours of constant... – ... Euclidean distance are circles (or spheres) – ... Manhattan distance are squares (or boxes) – ... Mahalanobis distance are ellipses (or ellipsoids) 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 37
  • 38. 4.4. TRAINING FEEDFORWARD NETWORKS BY BACKPROPAGATION53 this threshold component, but we have chosen here to use the familiar X,W Classifiers: Neural Networks notation, assuming that these vectors are “augmented” as appropriate.) We denote the weighted sum input to the i-th threshold unit in the j-th layer by (j) (j) (j) si . (That is, si = X(j−1) •Wi .) The number of TLUs in the j-th layer is (j) (j) given by mj . The vector Wi has components wl,i for l = 1, . . . , m(j−1) + 1. First Layer j-th Layer (k-1)-th Layer k-th Layer X(0) X(1) X(j) X(k-1) W(k) ... ... ... ... Wi(1) f Wi(j) Wi(k-1) wl(k) ... wli(j) s(k) si(1) si(j) si(k-1) ... ... ... m1 TLUs mj TLUs m(k-1) TLUs Figure 4.17: A k-layer Network http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf 4.4.2 The Backpropagation Method A gradient descent method, similar to that used in the Widrow Hoff method, has been proposed by various authors for training a multi-layer, feedforward network. As before, we define an error function on the final output of the 2011 network and we adjustMartins - lmartins@porto.ucp.pt as to minimize the error. Luís Gustavo each weight in the network so If we have a desired response, di , for the i-th input vector, Xi , in the training Wednesday, March 16, 2011 set, Ξ, we can compute the squared error over the entire training set to be: 38
  • 39. Gaussian Modeling p(x ,x ) 1 2 x 2 1 0.75 75 0.5 .5 5 0.25 25 0 0 66 CHAPTER 5. STATISTICAL LEARNING -5 0 x -5 1 5 p(x ,x ) 1 2 x x 2 2 6 6 4 4 1 0.75 75 2 2 0.5 .5 5 0.25 25 0 x 0 2 4 6 1 0 -2 -5 -4 0 x -5 1 5 -6 -6 -4 -2 0 2 4 6 x 2 http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf Figure 5.1: The Two-Dimensional Gaussian Distribution 6 6 4 4 2 Decide category 1 iff: 2 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt 0 x 2 4 6 1 -2 1 −1/2(X−M2 )t Σ2 (X−M2 ) −1 Wednesday, March 16, 2011 e 39
  • 40. x 2 Gaussian Mixture Models 0.75 1 75 0.5 .5 10 0.25 25 5 0 • Use multiple Gaussians to model the data 1. USING STATISTICAL DECISION THEORY 67 -5 0 0 x 5 -5 1 10 p(x ,x ) 1 2 x 10 2 7.5 5 1 2.5 0.75 75 10 0.5 .5 0 0.25 25 5 0 -2.5 -5 0 0 -5 x 5 -5 1 -5 -2.5 0 2.5 5 7.5 10 10 10 Figure 5.2: The Sum of Two Gaussian Distributions 7.5 http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf 5 2.5 0 Decide category 1 iff: 2011 -2.5 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 -5 (X − M1 )t Σ−1 (X − M1 ) (X − M2 )t Σ−1 (X − M2 ) + B 1 2 40
  • 41. attribute values along each dimension is approximately the same n the distance between the two vectors would be a2 (x1j − j=1 j Classifiers: kNN aj is the scale factor for dimension j. An example of a nearest-neighbor decision problem is shown the figure the class of a training pattern is indicated by the num • k-Nearest Neighbours class of training pattern training pattern Classifier 3 3 3 2 3 3 2 2 2 – Lazy Classifier 2 2 2 2 2 1 1 • no training is actually performed 1 1 3 3 1 1 (hence, lazy ;-)) 1 3 1 1 3 – An example of Instance Based X (a pattern to be classified) Learning k=8 four patterns of category 1 two patterns of category 2 two patterns of category 3 plurality are in category 1, so decide X is in category 1 http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf Figure 5.3: An 8-Nearest-Neighbor Decision Nearest-neighbor methods are memory intensive because a la 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt training patterns must be stored to achieve good generalization. cost is now reasonably low, the method and its derivatives hav Wednesday, March 16, 2011 41
  • 42. Decision Trees 6.2. SUPERVISED LEARNING OF UNIVARIATE DECISION TREES • Learn rules from data x3 • Apply each rule at each 0 1 node 0 x2 0 x4 1 1 • classification is at the 0 0 x3x4 x1 x3x2 leafs of the tree 1 0 1 x3 x2 1 0 x3 x4 x1 x3x4x1 f = x3x2 + x3x4x1 Figure 6.2: A Decision Tree Implementing a DNF Function http://robotics.stanford.edu/people/nilsson/MLBOOK.pdf 6.2.1 Selecting the Type of Test As usual, we have n features or attributes. If the attributes are binary, t tests are simply whether the attribute’s value is 0 or 1. If the attributes a 2011 Luís Gustavo Martinscategorical, but non-binary, the tests might be formed by dividing the attribu - lmartins@porto.ucp.pt values into mutually exclusive and exhaustive subsets. A decision tree with su tests is shown in Fig. 6.4. If the attributes are numeric, the tests might invol Wednesday, March 16, 2011 42
  • 43. Clustering: k-means http://en.wikipedia.org/wiki/K-means_clustering 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 43
  • 44. Evaluating a Classifier • Training Set – used for training the classifier • Testing Set – examples not used for training – avoids overfitting to the data – tests generalization abilities of the trained classifiers • Data sets are usually hard to obtain... – Labeling examples is time and effort consuming – Large labeled datasets usually not widely available – Requirement of separate training and testing datasets imposes higher difficulties... – Use Cross-Validation techniques! 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 44
  • 45. Evaluating a Classifier • Confusion Matrix http://en.wikipedia.org/wiki/Confusion_matrix 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 45
  • 46. Evaluating a Classifier • Costs of Error –We should also consider costs of different errors we make in our decisions. For example, if the fish packing company knows that: • Customers who buy salmon will object vigorously if they see sea bass in their cans. • Customers who buy sea bass will not be unhappy if they occasionally see some expensive salmon in their cans. • How does this knowledge affect our decision? 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 46
  • 47. Pattern Recognition References • Introduction to Machine Learning - Draft of Incomplete Notes, by Nils J. Nilsson (http:// robotics.stanford.edu/people/nilsson/MLBOOK.pdf) • Pattern Recognition general links (http://cgm.cs.mcgill.ca/~godfried/teaching/pr-web.html) • PR for HCI - Richard O. Duda notes (http://www.cs.princeton.edu/courses/archive/fall08/ cos436/Duda/PR_home.htm) • Talal Alsubaie PR Slides (http://www.slideshare.net/t_subaie/pattern-recognition-presentation) • Y-H Pao, Adaptive Pattern Recognition and Neural Networks (Addison-Wesley Publishing Co., Reading, MA, 1989). Augments statistical procedures by including neural networks, fuzzy- set representations and self-organizing feature maps. • R. Schalkoff, Pattern Recognition: Statistical, Structural and Neural Approaches (John Wiley Sons, New York, 1992). A clear presentation of the essential ideas in three important approaches to pattern recognition. • M. Stefik, Introduction to Knowledge Systems (Morgan Kaufmann, San Francisco, CA, 1995). Although this excellent book is not oriented towards pattern recognition per se, the methods it presents are the basis for knowledge-based pattern recognition. 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 47
  • 48. Pattern Recognition References • P. A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach (Prentice-Hall International, Englewood Cliffs, NJ, 1980). A very clear presentation of the mathematical foundations. • R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis (Wiley-Interscience, New York, 1973). Still in print after all these years. • K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd Ed. (Academic Press, New York, 1990. A standard, widely-used textbook. • D. J. Hand, Discrimination and Classification (John Wiley and Sons, Chichester, UK, 1981). A warmly recommended introductory book. • S. Haykin, Neural Networks (MacMillan, NY, 1993). There are dozens of interesting books on neural networks. Haykin is an excellent, engineering-oriented textbook. • T. Masters, Advanced Algorithms for Neural Networks (Wiley, NY, 1995). Well described by the title, with a chapter devoted to the often overlooked issue of validation. • WEKA Data Mining Software in Java (http://www.cs.waikato.ac.nz/ml/weka/) 2011 Luís Gustavo Martins - lmartins@porto.ucp.pt Wednesday, March 16, 2011 48