SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Enlarging the Margins in Perceptron
              Decision Trees
                     Kristin P. Bennett, Donghui Wu
                        Department of Mathematical Sciences

                           Rensselaer Polytechnic Institute

                     bennek@rpi.edu, wud2@rpi.edu

         Nello Cristianini                                John Shawe-Taylor
    Dept of Engineering Mathematics                       Dept of Computer Science

          University of Bristol                     Royal Holloway, University of London

nello.cristianini@bristol.ac.uk                         jst@dcs.rhbnc.ac.uk




                                        0-0
Perceptron Decision Trees

Definition 0.1 Perceptron Decision Trees (PDT), are decision
trees in which each internal node is associated with a hyperplane in
general position in the input space, i.e. the decision is constructed
using a linear combination of attributes, instead of one attribute.

                                          w1           x               x                       x         x
                                                           x               x           x
                                                                       x                       x             w3
                    w1                                                             x
                                                                                                             o
                                                   x                                               o     o       o
                                                                               o                       o o o
                                                               x                                           o
           w2                w3                x                                   o       o
                                                       x                                       o
                                                                   x           o
                                                                                           o       o     o
                                               x           x
       x        o        o        x                                x
                                                                               w2




                                      1
Which Decision Is Better?
               – Capacity Control of Linear Classifier




Construct a linear classifier: f (x) = wT x − b, i.e.
find a vector w ∈ Rn and a scalar b such that

               f (xi ) = wT xi − b       ≥1   if yi = 1,
               f (xi ) = wT xi − b ≤ −1 if yi = −1,
                                              i = 1, . . . ,


                                     2
Large Margin PDTs Generalize Better

Theorem 0.1 Suppose we are able to classify an m sample of
labeled examples using a perception decision tree and suppose that
the tree obtained contained K decision nodes with margins γi at
node i, then we can bound the generalization error with probability
greater than 1 − δ to be less than

        130R2                             (4m)K+1 2K
                                                   K
                 D log(4em) log(4m) + log
          m                                 (K + 1)δ

              K    1
where D =     i=1 γi .
                    2




                                 3
The Baseline Algorithms for Comparison
Algorithm 0.1 (Basic OC1 Algorithm )            Start with the root
node, while a node remains to split

 • Optimize the decision based on some splitting criterion.
    – Randomized search for local minimum.
    – Re-starts to jump out of local minimum.
 • Partition the node into two or more child nodes based on
   decision.

Prune the tree if necessary.
Splitting Criterion : Twoing Rule (goodness measure)
                                        k                   2
                       |TL | |TR |           |Li|   |Ri |
       T woingV alue =      ∗      ∗              −
                        n      n       i=1
                                             |TL | |TR |


                                4
Twoing Rule

Splitting criteria - Twoing Rule: (Breiman, et al. (1984))

                                           k                   2
                          |TL | |TR |           |Li|   |Ri |
        T woingV alue =        ∗      ∗              −
                            n     n       i=1
                                                |TL | |TR |

where
n = |TL | + |TR | - total number of instances at current node
k - number of classes, for two class problems
|TL | - number of instances on the left of the split, i.e. wT x − b >= 0
|TR | - number of instances on the right of the split i.e. wT x − b < 0
|Li | - number of instances in category i on the the left of the split
|Ri | - number of instances in category i on the the right of the split



                                   5
Enlarging the Margins of PDT

Algorithms of producing large margin PDTs:

 • Post-process existing trees (FAT).
   Find optimal separating hyperplane at each node of the
   existing trees.

 • Incorporate large Margin into splitting criteria (MOC1)
   max T woingV alue + C ∗ CurrentM argin

 • Incorporate large margin into goodness measure (MOC2)
   Modified Towing Rule.




                               6
The OC1-PDT and FAT-PDT
IDEA: Post-process the existing OC1-PDT, replace the the OC1
decision with optimal separating hyperlane at each decision node.


                x                       o           o       o                   o
                                                                                                            x
                                                        o   o       o       o
                                                                        o                       x       x
                    x           x                               o
                                                                                    x               x
                        x           x                                                       x
                                                                                        x
                                                                                                x       x
                    x           x
                                                                                        x
                        x                                                       x           x
                                                                    o
                x                                           o

                                            ;                                           x
                                                                    o
                                                                                x
                                                        o
                                        o                                                                       OC1
                        o                               o
                                                                                                                FAT
                            o
                                                o                               x
                        o


                                                                        7
The FAT Algorithm

1. Construct a decision tree using OC1, call it OC1-PDT.

2. Starting from root of OC1-PDT, traverses through all the non-leaf
   nodes. At each node,

    • Relabel the points at node with ω T x − b ≥ 0 as superclass right,
      the other points at this node as superclass left.
    • Find the perceptron (optimal separating hyperplane)
      f (x) = ω ∗ T x − b∗ , which separates superclasses right and left
      perfectly with maximal margin.
    • Replace the original perceptron with the new one.




                                    8
FAT Generalizes Better Than OC1
                                                     10−fold cross validation results: FAT vs OC1
                               100




                                95          significant
                                            x=y


                                90
  FAT 10−CV average accuracy




                                85




                                80




                                75




                                70




                                65
                                  65   70           75          80          85                90    95   100
                                                           OC1 10−CV average accuracy




                                                                      9
MOC1: Margin OC1

IDEA: Use Multi-objective splitting criterion to maximize both
TwoingValue and margin.

            max T woingV alue + C ∗ CurrentM argin


                                 x
                                                              o

                                     x            o

                             x                            o

                                                      o

                        x                     o           o

                             x                o




                    x                     o

                                              o




                                     10
MOC1 Generalizes Better Than OC1
                                                       10−fold cross validation results: MOC1 vs OC1
                                  100


                                               significant
                                               not significant
                                   95          x=y


                                   90
    MOC1 10−CV average accuracy




                                   85




                                   80




                                   75




                                   70




                                   65
                                     65   70           75             80          85            90     95   100
                                                                 OC1 10−CV average accuracy




                                                                        11
The MOC2 Algorithm
IDEA: Modify Twoing Criterion to allow “soft-margin”.
Want both high accuracy and strong separation/margin.

                                                     k                                            k
                |M TL | |M TR |                                |Li |   |Ri |                            |M Li |   |M Ri |
T woingV alue =        ∗        ∗                                    −       ∗                                  −
                  n       n                                    |TL |   |TR |                            |M TL |   |M TR |
                                                   i=1                                            i=1



                                   x               x o
                                       x       x                             o
                       x                                                          o
                                               x
                                                                             o    o
                                           x
                       x           x                    x      x                          o
                                       x                                     o    o
                           o                                                              o
                               x       x                       o             o    o
                                                        x                o            o

                                   x       x        o                        o
                       x                                                                      x
                                                                                  o
                                                                         o
                                           x                                  o
                                   x                                         o
                                                                     x

                                       wx=b+1               wx = b       wx=b-1




                                                         12
The Modified Twoing Rule

                                    k                       k
                |M TL | |M TR |           |Li |   |Ri |           |M Li |   |M Ri |
T woingV alue =        ∗        ∗               −       ∗                 −
                  n       n               |TL |   |TR |           |M TL |   |M TR |
                                    i=1                     i=1


where n = |TL | + |TR | - total number of instances at current node
k - number of classes, for two class problems
|TL | - number of instances on the left of the split, i.e. wT x − b >= 0
|TR | - number of instances on the right of the split i.e. wT x − b < 0
|Li | - number of instances in category i on the the left of the split
|Ri | - number of instances in category i on the the right of the split
|M TL | - number of instances on the left of the split, wT x − b >= 1
|M TR | - number of instances on the right of the split wT x − b <= −1
|M Li | - number of instances in category i with wT x − b >= 1
|M Ri | - number of instances in category i with wT x − b <= −1



                                     13
MOC2 Generalizes Better Than OC1
                                                      10−fold cross validation results: MOC2 vs OC1
                                  100


                                               significant
                                   95
                                               not significant
                                               x=y


                                   90
    MOC2 10−CV average accuracy




                                   85




                                   80




                                   75




                                   70




                                   65
                                     65   70          75          80          85               90     95   100
                                                             OC1 10−CV average accuracy




                                                                      14
Performances of OC1, FAT, MOC1 and MOC2
            OC1           FAT            MOC1          MOC2          Best
Dataset        x
               ¯    x (p value)
                    ¯              x (p value)
                                   ¯               x (p value)
                                                   ¯             classifier
Bright      98.46   98.62 (.05)    98.94 (.10)     98.82 (.10)    MOC1
Liver       65.22   66.09 (.10)    68.41 (.20)     70.14 (.04)    MOC2
Cancer      95.89   96.48 (.05)        95.60 (*)     95.89 (*)       FAT
Dim         94.82   94.92 (.20)    95.23 (.09)       94.90 (*)    MOC1
Heart       73.40   76.43 (.12)    75.76 (.21)     77.78 (.10)    MOC2
Housing     81.03   83.20 (.05)        82.02 (*)     80.23 (*)       FAT
Iris        95.33   96.00 (.17)        95.33 (*)     96.00 (*)       FAT
Diabetes    71.09   71.48 (.04)    73.18 (.08)     72.53 (.23)    MOC1
Prognosis   78.91    74.15 (**)        78.91 (*)    79.59 (*)     MOC2
Sonar       67.79   74.04 (.01)    72.12 (.19)     73.21 (.16)       FAT



                                  15
Conclusions


• Generalization error of PDT is bounded by function of
  margins, tree size, and training set size.

• Three algorithms to control capacity of PDT investigated:

  – Post-processing existing trees (FAT)
  – Incorporating margins into splitting criteria:
     ∗ Multicriteria splitting rule (MOC1)
     ∗ Soft-margin modified twoing-rule (MOC2).

• Theoretically and empirically enlarged margin PDT performed
  better.



                              16

Más contenido relacionado

La actualidad más candente

Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical MethodsChristian Robert
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Xin-She Yang
 
Likelihood survey-nber-0713101
Likelihood survey-nber-0713101Likelihood survey-nber-0713101
Likelihood survey-nber-0713101NBER
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2zukun
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Matthew Leingang
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learningbutest
 
Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Matthew Leingang
 
Senior Seminar: Systems of Differential Equations
Senior Seminar:  Systems of Differential EquationsSenior Seminar:  Systems of Differential Equations
Senior Seminar: Systems of Differential EquationsJDagenais
 
CPSC 125 Ch 1 sec 4
CPSC 125 Ch 1 sec 4CPSC 125 Ch 1 sec 4
CPSC 125 Ch 1 sec 4David Wood
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Polynomial functions modelllings
Polynomial functions modelllingsPolynomial functions modelllings
Polynomial functions modelllingsTarun Gehlot
 
Systems Of Differential Equations
Systems Of Differential EquationsSystems Of Differential Equations
Systems Of Differential EquationsJDagenais
 
Physics barriers and tunneling
Physics barriers and tunnelingPhysics barriers and tunneling
Physics barriers and tunnelingMohamed Anwar
 
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Matthew Leingang
 

La actualidad más candente (17)

Monte Carlo Statistical Methods
Monte Carlo Statistical MethodsMonte Carlo Statistical Methods
Monte Carlo Statistical Methods
 
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
Nature-Inspired Metaheuristic Algorithms for Optimization and Computational I...
 
Likelihood survey-nber-0713101
Likelihood survey-nber-0713101Likelihood survey-nber-0713101
Likelihood survey-nber-0713101
 
CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2CVPR2010: higher order models in computer vision: Part 1, 2
CVPR2010: higher order models in computer vision: Part 1, 2
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning
 
Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)Lesson 18: Maximum and Minimum Values (slides)
Lesson 18: Maximum and Minimum Values (slides)
 
Normal
NormalNormal
Normal
 
Senior Seminar: Systems of Differential Equations
Senior Seminar:  Systems of Differential EquationsSenior Seminar:  Systems of Differential Equations
Senior Seminar: Systems of Differential Equations
 
CPSC 125 Ch 1 sec 4
CPSC 125 Ch 1 sec 4CPSC 125 Ch 1 sec 4
CPSC 125 Ch 1 sec 4
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
11X1 T14 05 volumes
11X1 T14 05 volumes11X1 T14 05 volumes
11X1 T14 05 volumes
 
Polynomial functions modelllings
Polynomial functions modelllingsPolynomial functions modelllings
Polynomial functions modelllings
 
Information theory
Information theoryInformation theory
Information theory
 
Systems Of Differential Equations
Systems Of Differential EquationsSystems Of Differential Equations
Systems Of Differential Equations
 
Physics barriers and tunneling
Physics barriers and tunnelingPhysics barriers and tunneling
Physics barriers and tunneling
 
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
Lesson 14: Derivatives of Logarithmic and Exponential Functions (slides)
 

Destacado

Romo 1 page res handout 1_2015
Romo 1 page res handout 1_2015Romo 1 page res handout 1_2015
Romo 1 page res handout 1_2015Daniel Romo
 
Estrategia y planificación - No presentado - Taller Internet PyME EARTH
Estrategia y planificación - No presentado - Taller Internet PyME EARTHEstrategia y planificación - No presentado - Taller Internet PyME EARTH
Estrategia y planificación - No presentado - Taller Internet PyME EARTHPaul Fervoy
 
E-Learning y correo electrónico- herramienta gratuita.
 E-Learning y correo electrónico- herramienta gratuita. E-Learning y correo electrónico- herramienta gratuita.
E-Learning y correo electrónico- herramienta gratuita.Johanna Saenz
 
Výzva české lékařské_domobrany
Výzva české lékařské_domobranyVýzva české lékařské_domobrany
Výzva české lékařské_domobranykristinanemeckova
 
27 Curso Intensivo - Social CRM
27 Curso Intensivo - Social CRM27 Curso Intensivo - Social CRM
27 Curso Intensivo - Social CRMInterlat
 
Silver Sudoe - Presentacion taller creatividad 1 coolhunting
Silver Sudoe - Presentacion taller creatividad 1 coolhuntingSilver Sudoe - Presentacion taller creatividad 1 coolhunting
Silver Sudoe - Presentacion taller creatividad 1 coolhuntingRegion Limousin
 
Samsung Wireless Enterprise LAN - June 2014
Samsung Wireless Enterprise LAN  - June 2014Samsung Wireless Enterprise LAN  - June 2014
Samsung Wireless Enterprise LAN - June 2014Marcello Marchesini
 
Rockfon Steel Finish Pics
Rockfon  Steel  Finish  PicsRockfon  Steel  Finish  Pics
Rockfon Steel Finish Picsrockfonceiling
 
Comercio Justo. Oportunidades de Negocios.
Comercio Justo. Oportunidades de Negocios.Comercio Justo. Oportunidades de Negocios.
Comercio Justo. Oportunidades de Negocios.enendeavor
 
Lafont proteins 2007
Lafont proteins 2007Lafont proteins 2007
Lafont proteins 2007scbegay
 
Conexión de Samsung F480 a un PC mediante Bluetooth
Conexión de Samsung F480 a un PC mediante BluetoothConexión de Samsung F480 a un PC mediante Bluetooth
Conexión de Samsung F480 a un PC mediante Bluetoothladynoa
 
Alternativen für österreichische Verlader und Transportdienstleister
Alternativen für österreichische Verlader und TransportdienstleisterAlternativen für österreichische Verlader und Transportdienstleister
Alternativen für österreichische Verlader und TransportdienstleisterParadigma Consulting
 
Card wars
Card warsCard wars
Card warsKarpple
 
Retail Consumer Dynamics Study
Retail Consumer Dynamics StudyRetail Consumer Dynamics Study
Retail Consumer Dynamics StudyLouis Fernandes
 
Taller sistemas distribuidos sobre Windows usando VMWare
Taller sistemas distribuidos sobre Windows usando VMWareTaller sistemas distribuidos sobre Windows usando VMWare
Taller sistemas distribuidos sobre Windows usando VMWareDamian Barrios Castillo
 

Destacado (20)

Romo 1 page res handout 1_2015
Romo 1 page res handout 1_2015Romo 1 page res handout 1_2015
Romo 1 page res handout 1_2015
 
Estrategia y planificación - No presentado - Taller Internet PyME EARTH
Estrategia y planificación - No presentado - Taller Internet PyME EARTHEstrategia y planificación - No presentado - Taller Internet PyME EARTH
Estrategia y planificación - No presentado - Taller Internet PyME EARTH
 
E-Learning y correo electrónico- herramienta gratuita.
 E-Learning y correo electrónico- herramienta gratuita. E-Learning y correo electrónico- herramienta gratuita.
E-Learning y correo electrónico- herramienta gratuita.
 
DURMA Catalogue 1
DURMA Catalogue 1 DURMA Catalogue 1
DURMA Catalogue 1
 
Výzva české lékařské_domobrany
Výzva české lékařské_domobranyVýzva české lékařské_domobrany
Výzva české lékařské_domobrany
 
27 Curso Intensivo - Social CRM
27 Curso Intensivo - Social CRM27 Curso Intensivo - Social CRM
27 Curso Intensivo - Social CRM
 
Silver Sudoe - Presentacion taller creatividad 1 coolhunting
Silver Sudoe - Presentacion taller creatividad 1 coolhuntingSilver Sudoe - Presentacion taller creatividad 1 coolhunting
Silver Sudoe - Presentacion taller creatividad 1 coolhunting
 
Samsung Wireless Enterprise LAN - June 2014
Samsung Wireless Enterprise LAN  - June 2014Samsung Wireless Enterprise LAN  - June 2014
Samsung Wireless Enterprise LAN - June 2014
 
Rockfon Steel Finish Pics
Rockfon  Steel  Finish  PicsRockfon  Steel  Finish  Pics
Rockfon Steel Finish Pics
 
Comercio Justo. Oportunidades de Negocios.
Comercio Justo. Oportunidades de Negocios.Comercio Justo. Oportunidades de Negocios.
Comercio Justo. Oportunidades de Negocios.
 
Lafont proteins 2007
Lafont proteins 2007Lafont proteins 2007
Lafont proteins 2007
 
Conexión de Samsung F480 a un PC mediante Bluetooth
Conexión de Samsung F480 a un PC mediante BluetoothConexión de Samsung F480 a un PC mediante Bluetooth
Conexión de Samsung F480 a un PC mediante Bluetooth
 
Alternativen für österreichische Verlader und Transportdienstleister
Alternativen für österreichische Verlader und TransportdienstleisterAlternativen für österreichische Verlader und Transportdienstleister
Alternativen für österreichische Verlader und Transportdienstleister
 
Card wars
Card warsCard wars
Card wars
 
Consejos ahorro energia
Consejos ahorro energiaConsejos ahorro energia
Consejos ahorro energia
 
GSM + LTE
GSM + LTEGSM + LTE
GSM + LTE
 
Manual swat4
Manual swat4Manual swat4
Manual swat4
 
Retail Consumer Dynamics Study
Retail Consumer Dynamics StudyRetail Consumer Dynamics Study
Retail Consumer Dynamics Study
 
1505 JGIC 2015 - El Portal de la Recerca de Catalunya
1505 JGIC 2015 - El Portal de la Recerca de Catalunya1505 JGIC 2015 - El Portal de la Recerca de Catalunya
1505 JGIC 2015 - El Portal de la Recerca de Catalunya
 
Taller sistemas distribuidos sobre Windows usando VMWare
Taller sistemas distribuidos sobre Windows usando VMWareTaller sistemas distribuidos sobre Windows usando VMWare
Taller sistemas distribuidos sobre Windows usando VMWare
 

Similar a Perceptron Arboles De Decision

The width of an ideal chain
The width of an ideal chainThe width of an ideal chain
The width of an ideal chaincypztm
 
Mathematical physics group 16
Mathematical physics group 16Mathematical physics group 16
Mathematical physics group 16derry92
 
11X1 T16 05 volumes (2011)
11X1 T16 05 volumes (2011)11X1 T16 05 volumes (2011)
11X1 T16 05 volumes (2011)Nigel Simmons
 
アーベル論理でセッション型
アーベル論理でセッション型アーベル論理でセッション型
アーベル論理でセッション型Yoichi Hirai
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論岳華 杜
 
Chapter 5(5.1~5.2) Orthogonality.pptx
Chapter 5(5.1~5.2) Orthogonality.pptxChapter 5(5.1~5.2) Orthogonality.pptx
Chapter 5(5.1~5.2) Orthogonality.pptxNaeemUlIslam12
 
Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" ieee_cis_cyprus
 
Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)jillmitchell8778
 
11 x1 t16 05 volumes (2012)
11 x1 t16 05 volumes (2012)11 x1 t16 05 volumes (2012)
11 x1 t16 05 volumes (2012)Nigel Simmons
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semanticsJinho Choi
 
Maths ch15 key
Maths ch15 keyMaths ch15 key
Maths ch15 keyGary Tsang
 
P1 Graph Function Test
P1 Graph Function TestP1 Graph Function Test
P1 Graph Function Testguest3952880
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment helpHomeworkAssignmentHe
 

Similar a Perceptron Arboles De Decision (20)

Modeling and interpretation
Modeling and interpretationModeling and interpretation
Modeling and interpretation
 
The width of an ideal chain
The width of an ideal chainThe width of an ideal chain
The width of an ideal chain
 
Icraf seminar(de clerck)
Icraf seminar(de clerck)Icraf seminar(de clerck)
Icraf seminar(de clerck)
 
Session 2
Session 2Session 2
Session 2
 
Mathematical physics group 16
Mathematical physics group 16Mathematical physics group 16
Mathematical physics group 16
 
Hidden Markov Models
Hidden Markov ModelsHidden Markov Models
Hidden Markov Models
 
sol pg 89
sol pg 89 sol pg 89
sol pg 89
 
11X1 T17 05 volumes
11X1 T17 05 volumes11X1 T17 05 volumes
11X1 T17 05 volumes
 
11X1 T16 05 volumes (2011)
11X1 T16 05 volumes (2011)11X1 T16 05 volumes (2011)
11X1 T16 05 volumes (2011)
 
アーベル論理でセッション型
アーベル論理でセッション型アーベル論理でセッション型
アーベル論理でセッション型
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Chapter 5(5.1~5.2) Orthogonality.pptx
Chapter 5(5.1~5.2) Orthogonality.pptxChapter 5(5.1~5.2) Orthogonality.pptx
Chapter 5(5.1~5.2) Orthogonality.pptx
 
Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture" Johan Suykens: "Models from Data: a Unifying Picture"
Johan Suykens: "Models from Data: a Unifying Picture"
 
Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)Statistics lecture 6 (ch5)
Statistics lecture 6 (ch5)
 
11 x1 t16 05 volumes (2012)
11 x1 t16 05 volumes (2012)11 x1 t16 05 volumes (2012)
11 x1 t16 05 volumes (2012)
 
CS571: Distributional semantics
CS571: Distributional semanticsCS571: Distributional semantics
CS571: Distributional semantics
 
Maths ch15 key
Maths ch15 keyMaths ch15 key
Maths ch15 key
 
P1 Graph Function Test
P1 Graph Function TestP1 Graph Function Test
P1 Graph Function Test
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
 
Clementfinal copy
Clementfinal copyClementfinal copy
Clementfinal copy
 

Más de ESCOM

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo SomESCOM
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales SomESCOM
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som SlidesESCOM
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som NetESCOM
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networksESCOM
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales KohonenESCOM
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia AdaptativaESCOM
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1ESCOM
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3ESCOM
 
Art2
Art2Art2
Art2ESCOM
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo ArtESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima CognitronESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORKESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORKESCOM
 
Counterpropagation
CounterpropagationCounterpropagation
CounterpropagationESCOM
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPESCOM
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1ESCOM
 

Más de ESCOM (20)

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo Som
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales Som
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som Slides
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som Net
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales Kohonen
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia Adaptativa
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3
 
Art2
Art2Art2
Art2
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo Art
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima Cognitron
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation
CounterpropagationCounterpropagation
Counterpropagation
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAP
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1
 

Último

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 

Último (20)

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Perceptron Arboles De Decision

  • 1. Enlarging the Margins in Perceptron Decision Trees Kristin P. Bennett, Donghui Wu Department of Mathematical Sciences Rensselaer Polytechnic Institute bennek@rpi.edu, wud2@rpi.edu Nello Cristianini John Shawe-Taylor Dept of Engineering Mathematics Dept of Computer Science University of Bristol Royal Holloway, University of London nello.cristianini@bristol.ac.uk jst@dcs.rhbnc.ac.uk 0-0
  • 2. Perceptron Decision Trees Definition 0.1 Perceptron Decision Trees (PDT), are decision trees in which each internal node is associated with a hyperplane in general position in the input space, i.e. the decision is constructed using a linear combination of attributes, instead of one attribute. w1 x x x x x x x x x w3 w1 x o x o o o o o o o x o w2 w3 x o o x o x o o o o x x x o o x x w2 1
  • 3. Which Decision Is Better? – Capacity Control of Linear Classifier Construct a linear classifier: f (x) = wT x − b, i.e. find a vector w ∈ Rn and a scalar b such that f (xi ) = wT xi − b ≥1 if yi = 1, f (xi ) = wT xi − b ≤ −1 if yi = −1, i = 1, . . . , 2
  • 4. Large Margin PDTs Generalize Better Theorem 0.1 Suppose we are able to classify an m sample of labeled examples using a perception decision tree and suppose that the tree obtained contained K decision nodes with margins γi at node i, then we can bound the generalization error with probability greater than 1 − δ to be less than 130R2 (4m)K+1 2K K D log(4em) log(4m) + log m (K + 1)δ K 1 where D = i=1 γi . 2 3
  • 5. The Baseline Algorithms for Comparison Algorithm 0.1 (Basic OC1 Algorithm ) Start with the root node, while a node remains to split • Optimize the decision based on some splitting criterion. – Randomized search for local minimum. – Re-starts to jump out of local minimum. • Partition the node into two or more child nodes based on decision. Prune the tree if necessary. Splitting Criterion : Twoing Rule (goodness measure) k 2 |TL | |TR | |Li| |Ri | T woingV alue = ∗ ∗ − n n i=1 |TL | |TR | 4
  • 6. Twoing Rule Splitting criteria - Twoing Rule: (Breiman, et al. (1984)) k 2 |TL | |TR | |Li| |Ri | T woingV alue = ∗ ∗ − n n i=1 |TL | |TR | where n = |TL | + |TR | - total number of instances at current node k - number of classes, for two class problems |TL | - number of instances on the left of the split, i.e. wT x − b >= 0 |TR | - number of instances on the right of the split i.e. wT x − b < 0 |Li | - number of instances in category i on the the left of the split |Ri | - number of instances in category i on the the right of the split 5
  • 7. Enlarging the Margins of PDT Algorithms of producing large margin PDTs: • Post-process existing trees (FAT). Find optimal separating hyperplane at each node of the existing trees. • Incorporate large Margin into splitting criteria (MOC1) max T woingV alue + C ∗ CurrentM argin • Incorporate large margin into goodness measure (MOC2) Modified Towing Rule. 6
  • 8. The OC1-PDT and FAT-PDT IDEA: Post-process the existing OC1-PDT, replace the the OC1 decision with optimal separating hyperlane at each decision node. x o o o o x o o o o o x x x x o x x x x x x x x x x x x x x o x o ; x o x o o OC1 o o FAT o o x o 7
  • 9. The FAT Algorithm 1. Construct a decision tree using OC1, call it OC1-PDT. 2. Starting from root of OC1-PDT, traverses through all the non-leaf nodes. At each node, • Relabel the points at node with ω T x − b ≥ 0 as superclass right, the other points at this node as superclass left. • Find the perceptron (optimal separating hyperplane) f (x) = ω ∗ T x − b∗ , which separates superclasses right and left perfectly with maximal margin. • Replace the original perceptron with the new one. 8
  • 10. FAT Generalizes Better Than OC1 10−fold cross validation results: FAT vs OC1 100 95 significant x=y 90 FAT 10−CV average accuracy 85 80 75 70 65 65 70 75 80 85 90 95 100 OC1 10−CV average accuracy 9
  • 11. MOC1: Margin OC1 IDEA: Use Multi-objective splitting criterion to maximize both TwoingValue and margin. max T woingV alue + C ∗ CurrentM argin x o x o x o o x o o x o x o o 10
  • 12. MOC1 Generalizes Better Than OC1 10−fold cross validation results: MOC1 vs OC1 100 significant not significant 95 x=y 90 MOC1 10−CV average accuracy 85 80 75 70 65 65 70 75 80 85 90 95 100 OC1 10−CV average accuracy 11
  • 13. The MOC2 Algorithm IDEA: Modify Twoing Criterion to allow “soft-margin”. Want both high accuracy and strong separation/margin. k k |M TL | |M TR | |Li | |Ri | |M Li | |M Ri | T woingV alue = ∗ ∗ − ∗ − n n |TL | |TR | |M TL | |M TR | i=1 i=1 x x o x x o x o x o o x x x x x o x o o o o x x o o o x o o x x o o x x o o x o x o x wx=b+1 wx = b wx=b-1 12
  • 14. The Modified Twoing Rule k k |M TL | |M TR | |Li | |Ri | |M Li | |M Ri | T woingV alue = ∗ ∗ − ∗ − n n |TL | |TR | |M TL | |M TR | i=1 i=1 where n = |TL | + |TR | - total number of instances at current node k - number of classes, for two class problems |TL | - number of instances on the left of the split, i.e. wT x − b >= 0 |TR | - number of instances on the right of the split i.e. wT x − b < 0 |Li | - number of instances in category i on the the left of the split |Ri | - number of instances in category i on the the right of the split |M TL | - number of instances on the left of the split, wT x − b >= 1 |M TR | - number of instances on the right of the split wT x − b <= −1 |M Li | - number of instances in category i with wT x − b >= 1 |M Ri | - number of instances in category i with wT x − b <= −1 13
  • 15. MOC2 Generalizes Better Than OC1 10−fold cross validation results: MOC2 vs OC1 100 significant 95 not significant x=y 90 MOC2 10−CV average accuracy 85 80 75 70 65 65 70 75 80 85 90 95 100 OC1 10−CV average accuracy 14
  • 16. Performances of OC1, FAT, MOC1 and MOC2 OC1 FAT MOC1 MOC2 Best Dataset x ¯ x (p value) ¯ x (p value) ¯ x (p value) ¯ classifier Bright 98.46 98.62 (.05) 98.94 (.10) 98.82 (.10) MOC1 Liver 65.22 66.09 (.10) 68.41 (.20) 70.14 (.04) MOC2 Cancer 95.89 96.48 (.05) 95.60 (*) 95.89 (*) FAT Dim 94.82 94.92 (.20) 95.23 (.09) 94.90 (*) MOC1 Heart 73.40 76.43 (.12) 75.76 (.21) 77.78 (.10) MOC2 Housing 81.03 83.20 (.05) 82.02 (*) 80.23 (*) FAT Iris 95.33 96.00 (.17) 95.33 (*) 96.00 (*) FAT Diabetes 71.09 71.48 (.04) 73.18 (.08) 72.53 (.23) MOC1 Prognosis 78.91 74.15 (**) 78.91 (*) 79.59 (*) MOC2 Sonar 67.79 74.04 (.01) 72.12 (.19) 73.21 (.16) FAT 15
  • 17. Conclusions • Generalization error of PDT is bounded by function of margins, tree size, and training set size. • Three algorithms to control capacity of PDT investigated: – Post-processing existing trees (FAT) – Incorporating margins into splitting criteria: ∗ Multicriteria splitting rule (MOC1) ∗ Soft-margin modified twoing-rule (MOC2). • Theoretically and empirically enlarged margin PDT performed better. 16