SlideShare una empresa de Scribd logo
1 de 7
BALANCING BOARD MACHINES
Frederic Maire,
School of Software Engineering and Data Communication,
Queensland University of Technology, Box 2434, Brisbane, Qld 4001,
Australia
f.maire@qut.edu.au
Abstract
The support vector machine solution corresponds to the
center of the largest sphere inscribed in version space.
Alternative approaches like Bayesian Point Machine
(BPM) and Analytic Center Machine have suggested that
the generalization performance can be further enhanced
by considering other possible centers of version space like
the centroid (center of mass). We present an algorithm to
compute exactly the centroid of higher dimensional
polyhedra, then derive approximation algorithms to build
a new learning machine whose performance is
comparable to BPM.
Key Words
Kernel machines, Bayesian Point, Centroid.
1. Introduction
Kernel classifiers are non-linear decision functions for
binary classification. In the Kernel Machine framework
(Muller & Mika & Ratch & Tsuda & Scholkopf, [1];
Scholkopft & Smola, [2]), a feature mapping )(xx φ
from an input space to a feature space is given (generally,
implicitly via a kernel function), as well as a training set
of input vectors { }m
xx ,,1
 with the corresponding
class labels { }myy ,,1  where { }1,1 +−∈iy . The
learning problem is formulated as a search problem for a
linear classifier hypothesis (a weight vector w ) belonging
to a subset of the feature space called version space;
{ }0)(xw,],,1[| i >∈∀ φiymiw . In other
words, version space is the set of weight vectors w that
are consistent with the training set. Because only the
direction of w matters for classification purpose,
without loss of generality, we can restrict the search for
w to the unit sphere in feature space. The training
algorithm of a Support Vector Machine (SVM) returns the
weight vector w that has the smallest maximum angle
between w and the )( ii xy φ ’s. The Kernel trick is
that for certain feature spaces and mappings φ, there
exist easily computable kernel functions k defined on
the input space such that )(),(),( yxyxk φφ= .
With such a kernel function k , the computation of inner
products )(),( yx φφ does not require the explicit
knowledge of φ. In fact for a given kernel function k ,
there may exist many different mappings φ.
Geometrically, each training example ( )ii yx , defines a
half-space in feature space through the constraint
0)(xw, i >φiy on w . It is easy to see that
version space is a polyhedral cone of feature space.
Figure 1 shows a bird-eye view (slice of the polyhedral
cone) of version space.
Figure 1: An elongated version space. The SVM point is
the centre of the sphere.
The SVM solution point SVMw is the centre of the
largest sphere whose centre is a unit vector and is
contained in the polyhedral cone.
Bayes Point Machines (BPM) are a well-founded
improvement which approximates the Bayes-optimal
decision by the centroid (also known as the centre of mass
or barycentre) of version space. It happens that the Bayes
point is very close to the centroid of version space in high
dimensional spaces. The Bayes point achieves better
generalization performance in comparison to SVMs
(Opper & Haussler, [3]; Shawe-Taylor & Williamson, [4];
Graepel & Herbrich & Campbell, [5]; Watkin, [6]).
An intuitive way to see why the centroid is a good choice
is to view version space as a committee of experts who all
agree on the training set. A new input vector x
corresponds to a hyperplane in feature space that may cut
version space in two parts. In the example of Figure 1,
the experts on the right of the hyperplane normal to
( )xφ classify x positively, whereas the experts on the
left classify x negatively. It is reasonable to use the
opinion of the majority of the experts that successfully
classified the training set to predict the class label of x .
The expert that agrees the most with the majority vote on
new inputs is precisely the Bayesian point. In a standard
committee machine, for each new input we seek the
opinions of a finite number of experts’ then take a
majority vote, whereas in a BPM, the expert that most
often agrees with the majority vote of the infinite
committee (version space) is delegated the task of
classifying the new input.
Following Rujan [7], Herbrich and Graepel [8] introduced
two algorithms to stochastically approximate the centroid
of version space: a billiard sampling algorithm and a
sampling algorithm based on the well known perceptron
algorithm.
In this paper, we present an algorithm to compute exactly
the centroid of a polyhedron in a high dimensional space.
From this exact algorithm, we derive an algorithm to
approximate a centroid position in a polyhedral cone. We
show empirically that the corresponding machine presents
better generalization capability than SVMs on a number a
benchmark data sets.
In section 2, we introduce an algorithm to compute
exactly the centroid of higher dimensional polyhedra. In
section 3, we show how to use this algorithm to
approximate the centroid of version space. In section 4,
some implementation issues are considered and some
experimental results are presented.
2. Exact Computation of the Centroid of a
Higher Dimensional Polyhedron
A polyhedron P is the intersection of a finite number of
half-spaces. It is best represented by a system of non
redundant linear inequalities { }bAxxP ≤= | . Recall
that the 1-volume is the length, the 2-volume is the
surface and the 3-volume is the every-day-life volume.
The algorithm that we introduce for
computing the centroid of an n -dimensional
polyhedron is an extension of the work by
Lasserre [10] who showed that the n -
dimensional volume ),,( bAnV of a
polyhedron { }bAxxP ≤= | is related to the
)1( −n -dimensional volumes of its facets and the row
vectors of its matrix A by the following formula;
),,1()/()/1(),,( bAnVabnbAnV i
i
ii −×= ∑
where ia denotes the it h
row of A and
),,1( bAnVi − denotes )1( −n -dimensional
volume of the ith
facet. The computation of the
centroid and the )1( −n -volume of a facet is done
by variable elimination. Geometrically, this amounts to
projecting the facet onto an axis parallel hyperplane, then
computing the volume and the centroid of this
projection recursively in a lower dimensional space.
From the volume and centroid of the projected facet,
we can derive the centroid and volume of the
original facet.
The formulae below are obtained by considering the n -
fold integral defining the n -dimensional volume and
decomposing the polyhedron into cones. The centroid of
a polyhedron can be computed recursively in the
following manner;
• Compute recursively the centroids iFG and the
)1( −n -volumes iFV of each facet (face of
dimension 1−n ) iF of P . Each facet iF
corresponds to the intersection of P with the
hyperplane defined by the th
i row of the system
bAx ≤ .
• Compute ∑
∑
×=
i
F
j
F
F
E i
j
i
G
V
V
G , the centroid of
the envelope of P (the union of the facets iF ).
• Compute the centroids iCG and the n -volumes
iCV of the cones ),cone( iEi FGC = rooted at
EG . If ih is the distance from EG to the
hyperplane containing iF , then ii F
i
C V
n
h
V ×=
and ii FECE GG
n
n
GG ×
+
=
1
.
• Compute G the centroid of P as the weighted sum
∑
∑
×=
i
C
j
C
C
i
j
i
G
V
V
G .
It is useful to observe that the computation of the volume
and the centroid of a ( )1−n -dimensional polyhedron in
a n -dimensional space is identical to the computation of
the volume and the centroid of a facet of a n -
dimensional polyhedron. For further details, see the
Matlab source code at
http://www.fit.qut.edu.au/~maire/G.
3 Balancing Board Machines
3.1 A Mechanical Point of View
The point of contact of a board posed in equilibrium
on a sphere (assumed to be the only source of gravity)
is the centroid of the board. This observation is the
basis of our “balancing board algorithm”. In the rest of
this paper, the term “board” will refer to the
intersection of the polyhedral cone of version space
with a hyperplane normal to a unit vector w of
version space. This definition implies that if the
polyhedral cone is n -dimensional then a board will
be a ( )1−n -dimensional polyhedron tangent to the
unit sphere.
In the algorithm we propose, the approximation w of
the centroid direction of the cone is refined by
computing the centroid of the board normal to w ,
and then rotating w towards the centroid of the board
(stopping at a local minimum of the volume of the
board in this line search).
Figure 2 illustrates the balancing process of a board.
Notice that Figure 2 is simply an illustration as in
dimension 2 the line search would succeed in just one
line-search iteration!
Figure 2: Top left, initial board. Top right, after one
iteration. Bottom left, after two iterations. Bottom right
after three iterations.
3.2 Exploring the Polyhedral Cone
Statistical learning theory (Scholkopft & Smola, [2])
tells us that the Bayes point w belongs to the vector
subspace V generated by the family of vectors
( ) ( ){ }m
xx φφ ,,1
 , that is w is of the form
( )∑=
j
j
j xw φα .
Once we know an orthonormal basis of V (the
orthonormality is with respect to the inner product in
feature space corresponding to the kernel function in the
input space), we can express the polyhedral cone
inequalities with respect to this orthonormal basis. Then
we can apply the formulae of section 2 to compute the
centroid of any polyhedron expressed in this orthonormal
basis. The kernel PCA basis is an orthonormal basis B
of V . Its basis vectors are the eigenvectors of the
symmetric matrix ( ){ } jiji xxkK ,
,= .
By expressing the polyhedral cone defined by the training
examples in B , we will be able to approximate a
centroid direction with the board balancing algorithm
sketched in section 3.1 and detailed below.
The complexity of the algorithm of section 2 to compute
exactly the centroid is unfortunately exponential. The
computational cost of the exact calculation of the centroid
is too high even for medium size data sets. However, the
recursive formulae allow us to derive an approximation of
the volume and the centroid of a polyhedron once we
have approximations for the volumes and the centroids of
its facets.
Because the balancing board algorithm requires several
board centroid estimations, it is desirable to recycle
intermediate results as much as possible to achieve a
significant reduction in computation time. Because the
intersection of a hyperplane and a spheric cone is an
ellipsoid, we estimate the volume and the centroid of the
intersection of the board and a facet of the polyhedral
cone (this intersection is (n-2)-dimensional) with the
volume and the centroid of the intersection of the board
and the largest spheric cone contained in the facet (this
spheric cone is (n-1)-dimensional). The computation of
these largest spheric cones is done only once. The centre
of the ellipsoid and its quadratic matrix is easily derived
from the centre and radius of the spheric cone. These
derivations are explained in the next sub-sections.
To simplify the computations, we have restricted our
study to non-singular kernel matrices (like those obtained
from Gaussian kernels).
3.2.1 Change of Basis
Let Bw be the coordinates of w with respect to
( ) ( ){ }m
xxB φφ ,,1
= . Recall that the Kernel PCA
basis is made of the eigenvectors of K . Let Uw be the
coordinates of w with respect to the Kernel PCA
orthonormal basis { }m
uu ,,1
 . We have UB Uww = .
Let IKM
def
λ+= , where λ is a non-negative
regularization parameter as in (Herbrich et al, [9]). We are
looking for Bw such that ( ) 0diag ≤− BMwy with
1, =ww and w near the centroid direction of the
polyhedral cone. As we have ( ) U
T
U wwww =, , in
practice, we look for the centroid direction of the cone
( )
( )


=
≤−
1
0diag
U
T
U
U
ww
MUwy
(2)
3.2.2 Computing the Spheric Centre of a
Polyhedral Cone Derived from a Non
Singular Mercer Kernel Matrix
Let 0≤Ax be the non-empty polyhedral cone derived
from the kernel matrix. The matrix A is square (
nm = ). Without loss of generality, we assume that its
rows are normalized. That is each row is a vector of norm
1. Recall that the spheric centre sw of the cone
(direction of the centre of the largest sphere contained in
the polyhedral cone and whose centre is at distance one
from zero) corresponds to the SVM solution. Because
A is square and non-singular, each facet of the
polyhedral cone touches the largest sphere. If each facet
is moved by a distance of one in the direction of its
normal vector, the new cone obtained is a translation of
the original cone in the direction of sw . That is sw
can be obtained by solving 1

−=Ax .
Once the direction swu = of the spheric centre of the
polyhedral cone is determined, the radius r of the
largest sphere centered at u can be computed. Here the
radius of the spheric cone is defined as the radius of the
largest (n-1)-sphere contained in the intersection of the
cone and the hyperplane 1=xuT
. We use this
definition to avoid geodesics.
3.2.3 Computation of r
We write :),(kA to denote the kth
row of matrix A .
For each i , let ( )( )uiAi :,acos2/ −−=πα . The
scalar r is the minimum over all ( )itan α . If we are
interested in the attributes of the cone contained in the
facet ( ) 0:, =xkA , we simply solve the system
( )
( )


=
−=+−
0:,
1::],1,1:1[
xkA
xkkA
3.2.4 Spheric Cone Equation
Given the characteristic attributes u and r of a spheric
cone, we can derive a simple equation for the cone.
Let ( )uxuz T
= and zxy −= . The cone equation is
zzryy TT 2
= . An alternative equation (derived from
Pythagoras theorem) is ( )( )22
1 xurxx TT
+= .
Our estimation of the volume and centroid of the board
requires the estimation of the volume and centroid of the
intersection of a cone and two hyperplanes (namely a
facet and the hyperplane containing the board).
3.2.5 Intersection of a Spheric Cone and Two
Hyperplanes
Consider the cone ( )( )22
1 xurxx TT
+= contained in
the kth
facet (that is ( ) 0:, =ukA ). Let’s compute the
ellipsoid defined by the intersection of this cone and the
hyperplane 1=xwT
( w is normal to the board).
Let [ ]
( ) 













== −
:,
null,, 21
kA
w
qqQ
T
n . Let h be
the intersection of the ray defined by u and the
hyperplane 1=xwT
. Let us make the change of
variables zQhx += . One can easily check that
( ) ( )( ) 0:,and1,2
=+=+∈∀ −
zQhkAzQhwRz Tn
We derive now the equation of the ellipsoid with respect
to z . From ( )( )22
1 xurxx TT
+= , we obtain
( ) ( ) ( )( )22
1 zQuhurzQhzQh TTT
++=++
After developing the expression, we get
( )( )
( )( ) ( )( )zQhuurzQuuQzr
hurzQQzzQhhh
TTTTT
TTTTT
211
12
22
22
+++
++=++
After regrouping, we have
( )( )
( )( )( )
( )( ) 01
12
1
22
2
2
=+−
++−
++−
hurhh
zQhuurh
zQuurIQz
TT
TTT
T
n
TT
From this expression we can derive an expression of the
form ( ) ( ) bczDcz
T
=−− that will tell us the (n-2)-
volume of the ellipsoid and its centre. It is easy to check
that cQh + is the centre of the ellipsoid in n
R .
In the next subsection, we show how to compute the
volume of the ellipsoid.
3.2.6 Volume of an Ellipsoid
( ) ( ) bcxDcx
T
=−−
Without loss of generality, we assume that
1and0 == bc . The matrix D is symmetric non-
negative, therefore there exists a decomposition
T
PPD ∆= where P is orthogonal and ∆ non-
negative and diagonal.
Let xPy T
= , the equation of the ellipsoid becomes
∑ =
i
ii y 12
δ . Let iii yz δ= . We can check that
if y is on the ellipsoid then z is on the unit sphere and
reciprocally. That is the ellipsoid is obtained from the
unit sphere by the linear transformation of matrix M ,
where


















=
n
M
δ
δ
1
00
0
0
00
1
1




Recall that if ( ) Mxxf = is a linear transformation and
S is a subset of the vector space, then we have,
( ) ( )( ) ( )SSf volMdetabs)(vol ×= .
The volume of the ellipsoid is therefore ∏=
n
i i1
1
δ
times
the n-volume of the n-sphere.
For completeness, let us mention that the volume of a n-
sphere of radius r is
( )nn
r
nn
2
1
2
1
2
Γ
×
π , and the volume
of the n-rectangle containing the ellipsoid is
∏=
×
n
i i
n
1
1
2
δ
.
3.2.7 Distance from w to a Facet
To compute the ( )1−n -volume of the intersection of
the board 1=xwT
and the polyhedral cone P, we need
to find for each facet ( ) 0:, ≤xkA the point x in the
plane generated by w and ( )T
kA :, that belongs to
this intersection. That is the orthogonal projection of w
on the hinge defined as the intersection of the board and
the kth
cone facet. The point ( ) wkAx
T
βα += :,
must satisfy
( )


=
=
0:,
1
xkA
xwT
.
Therefore ( )wkA :,where
0
1
=



=+
=+
γ
γβα
βγα
.
That is,
( ) 











×=
−
0
1
1
1
]:,[
1
γ
γ
wkAx
T
.
4. Implementation Issues and Experimental
Results
We have implemented the exact computation of the
centroid and the volume in Matlab. A direct recursive
implementation of Lasserre formula would be very
inefficient as faces of dimension k share faces of
dimension 1−k . Our implementation caches the
volumes and centroids of the lower dimensional faces in a
hash-table.
Our algorithm has been validated by comparing the values
returned with a Monte-Carlo method.
As Lasserre’s formula is valid only if the polyhedron is
represented as a system of non-redundant linear
inequalities. Redundancy must be detected and
eliminated by using a linear optimization.
The kernel matrix of a Gaussian kernel can only be
singular when identical input vectors occur more than
once the training set. We remove repeated occurrences of
the same input vector and assign the most common label
for this input vector to the occurrence that we leave in the
training set.
The table which follows summarises generalization
performance (percentage of correct predictions on test
sets) of the Balancing Board Machine (BBM) on 6
standard benchmarking data sets from the UCI
Repository, comparing results for illustrative purposes
with equivalent hard margin support vector machines. In
each case the data was randomly partitioned into 20
training and test sets in the ratio 60%:40%.
Data set SVM BBM
heart disease 58.36 58.40
thyroid 94.34 95.23
diabetes 66.89 67.68
waveform 83.50 83.50
sonar 85.06 85.78
ionosphere 86.79 86.86
The results obtained with a BBM are comparable to those
obtained with a BPM, but the improvement is not always
as dramatic as those reported in (Herbrich et al., [9]). We
observed that the improvement was generally better for
smaller data sets. We suspect that this is due to the fact
the volumes considered become very small in high
dimensional spaces. In fact, on a PC, unit spheres
“vanish” when their dimension exceed 340. The volume
of a unit sphere of dimension 340 is 4.5 10-223
. This is
why we consider the logarithm of the volume in our
programs.
5. Conclusion
The exact computation algorithm can be useful for
benchmarking to people developing new centroid
approximation algorithms. We do not claim that our
BBM approach is superior to any other given that the
computational cost is in the order of m times the cost of
a SVM computation (where m is the number of training
examples).
Replacing the ellipsoids with a more accurate estimation
would probably give better results, but deriving the
volume and the centroid of the intersection of a facet and
a board from the volume and the centroid of the
intersection of the same facet with another board seems to
be a hard problem.
The computation of the SVM point presented in section
3.2.2 provides an efficient learning algorithm for
Gaussian kernels.
4. Acknowledgement
I would like to thank Professor Tom Downs and Professor
Peter Bartlett for their valuable comments on a previous
version of the BBM algorithm. This work was partially
supported by an ATN grant.
References
[1] Muller, K., Mika, S., Ratch, G., Tsuda, K., and
Scholkopf, B. An Introduction To Kernel-Based
Learning Algorithms. IEEE Trans. on NN, vol 12, no 2,
2001, pp 181-201.
[2] Scholkopft, B., Smola, A., Learning with Kernels,
http://www.kernel-machines.org/
[3] M. Opper and D. Haussler, Generalization
performance of Bayes optimal classification algorithm
for learning a perceptron, Phys. Rev. Lett., vol. 66, p.
2677, 1991.
[4] J. Shawe-Taylor and R. C. Williamson, A PAC
analysis of a Bayesian estimator, Royal Holloway,
Univ. London, Tech. Rep. NC2-TR-1997-013, 1997.
[5] T. Graepel, R. Herbrich, and C. Campbell, Bayes
point machines: Estimating the bayes point in kernel
space, in Proc.f IJCAI Workshop Support Vector
Machines, 1999, pp. 23-27.
[6] T. Watkin, Optimal learning with a neural network,
Europhys. Lett., vol. 21, pp. 871-877, 1993.
[7] P. Ruján, Playing billiard in version space, Neural
Comput., vol. 9, pp. 197-238, 1996.
[8] R. Herbrich and T. Graepel, Large scale Bayes point
machines, Advances in Neural Information System
Processing 13, 2001.
[9] R. Herbrich, T. Graepel, and C. Bayes Point
Machines, Journal of Machine Learning Research, 1
(2001) 245--279.
[10] Lasserre, J., An analytical Expression and an
Algorithm for the volume of a Convex Polyhedron in
Rn
, Journal of Optimization Theory and Applications,
Vol 39, No 3, 1983.
Schrijver, A. Theory of Linear and Integer Programming,
Wiley-Interscience Publication (1990).
Theodore B. Trafalis, Alexander M. Malyscheff: An
Analytic Center Machine. 203-223, Machine Learning,
Volume 46, 2002
the same input vector and assign the most common label
for this input vector to the occurrence that we leave in the
training set.
The table which follows summarises generalization
performance (percentage of correct predictions on test
sets) of the Balancing Board Machine (BBM) on 6
standard benchmarking data sets from the UCI
Repository, comparing results for illustrative purposes
with equivalent hard margin support vector machines. In
each case the data was randomly partitioned into 20
training and test sets in the ratio 60%:40%.
Data set SVM BBM
heart disease 58.36 58.40
thyroid 94.34 95.23
diabetes 66.89 67.68
waveform 83.50 83.50
sonar 85.06 85.78
ionosphere 86.79 86.86
The results obtained with a BBM are comparable to those
obtained with a BPM, but the improvement is not always
as dramatic as those reported in (Herbrich et al., [9]). We
observed that the improvement was generally better for
smaller data sets. We suspect that this is due to the fact
the volumes considered become very small in high
dimensional spaces. In fact, on a PC, unit spheres
“vanish” when their dimension exceed 340. The volume
of a unit sphere of dimension 340 is 4.5 10-223
. This is
why we consider the logarithm of the volume in our
programs.
5. Conclusion
The exact computation algorithm can be useful for
benchmarking to people developing new centroid
approximation algorithms. We do not claim that our
BBM approach is superior to any other given that the
computational cost is in the order of m times the cost of
a SVM computation (where m is the number of training
examples).
Replacing the ellipsoids with a more accurate estimation
would probably give better results, but deriving the
volume and the centroid of the intersection of a facet and
a board from the volume and the centroid of the
intersection of the same facet with another board seems to
be a hard problem.
The computation of the SVM point presented in section
3.2.2 provides an efficient learning algorithm for
Gaussian kernels.
4. Acknowledgement
I would like to thank Professor Tom Downs and Professor
Peter Bartlett for their valuable comments on a previous
version of the BBM algorithm. This work was partially
supported by an ATN grant.
References
[1] Muller, K., Mika, S., Ratch, G., Tsuda, K., and
Scholkopf, B. An Introduction To Kernel-Based
Learning Algorithms. IEEE Trans. on NN, vol 12, no 2,
2001, pp 181-201.
[2] Scholkopft, B., Smola, A., Learning with Kernels,
http://www.kernel-machines.org/
[3] M. Opper and D. Haussler, Generalization
performance of Bayes optimal classification algorithm
for learning a perceptron, Phys. Rev. Lett., vol. 66, p.
2677, 1991.
[4] J. Shawe-Taylor and R. C. Williamson, A PAC
analysis of a Bayesian estimator, Royal Holloway,
Univ. London, Tech. Rep. NC2-TR-1997-013, 1997.
[5] T. Graepel, R. Herbrich, and C. Campbell, Bayes
point machines: Estimating the bayes point in kernel
space, in Proc.f IJCAI Workshop Support Vector
Machines, 1999, pp. 23-27.
[6] T. Watkin, Optimal learning with a neural network,
Europhys. Lett., vol. 21, pp. 871-877, 1993.
[7] P. Ruján, Playing billiard in version space, Neural
Comput., vol. 9, pp. 197-238, 1996.
[8] R. Herbrich and T. Graepel, Large scale Bayes point
machines, Advances in Neural Information System
Processing 13, 2001.
[9] R. Herbrich, T. Graepel, and C. Bayes Point
Machines, Journal of Machine Learning Research, 1
(2001) 245--279.
[10] Lasserre, J., An analytical Expression and an
Algorithm for the volume of a Convex Polyhedron in
Rn
, Journal of Optimization Theory and Applications,
Vol 39, No 3, 1983.
Schrijver, A. Theory of Linear and Integer Programming,
Wiley-Interscience Publication (1990).
Theodore B. Trafalis, Alexander M. Malyscheff: An
Analytic Center Machine. 203-223, Machine Learning,
Volume 46, 2002

Más contenido relacionado

La actualidad más candente

論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetagging
Takashi Abe
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Models
butest
 

La actualidad más candente (19)

SPAA11
SPAA11SPAA11
SPAA11
 
Vc dimension in Machine Learning
Vc dimension in Machine LearningVc dimension in Machine Learning
Vc dimension in Machine Learning
 
Lecture 11 (Digital Image Processing)
Lecture 11 (Digital Image Processing)Lecture 11 (Digital Image Processing)
Lecture 11 (Digital Image Processing)
 
2009 asilomar
2009 asilomar2009 asilomar
2009 asilomar
 
A Quest for Subexponential Time Parameterized Algorithms for Planar-k-Path: F...
A Quest for Subexponential Time Parameterized Algorithms for Planar-k-Path: F...A Quest for Subexponential Time Parameterized Algorithms for Planar-k-Path: F...
A Quest for Subexponential Time Parameterized Algorithms for Planar-k-Path: F...
 
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
AN OPTIMAL FUZZY LOGIC SYSTEM FOR A NONLINEAR DYNAMIC SYSTEM USING A FUZZY BA...
 
Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...
Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...
Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...
 
C g.2010 supply
C g.2010 supplyC g.2010 supply
C g.2010 supply
 
Image sampling and quantization
Image sampling and quantizationImage sampling and quantization
Image sampling and quantization
 
International journal of applied sciences and innovation vol 2015 - no 1 - ...
International journal of applied sciences and innovation   vol 2015 - no 1 - ...International journal of applied sciences and innovation   vol 2015 - no 1 - ...
International journal of applied sciences and innovation vol 2015 - no 1 - ...
 
Volume computation and applications
Volume computation and applications Volume computation and applications
Volume computation and applications
 
Svm
SvmSvm
Svm
 
Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...Optimal interval clustering: Application to Bregman clustering and statistica...
Optimal interval clustering: Application to Bregman clustering and statistica...
 
Space-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment KernelsSpace-efficient Feature Maps for String Alignment Kernels
Space-efficient Feature Maps for String Alignment Kernels
 
Clustering lect
Clustering lectClustering lect
Clustering lect
 
Chapter 2 Image Processing: Pixel Relation
Chapter 2 Image Processing: Pixel RelationChapter 2 Image Processing: Pixel Relation
Chapter 2 Image Processing: Pixel Relation
 
QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...
QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...
QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetagging
 
Pattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical ModelsPattern Recognition and Machine Learning : Graphical Models
Pattern Recognition and Machine Learning : Graphical Models
 

Destacado

CS 898O : Machine Learning
CS 898O : Machine LearningCS 898O : Machine Learning
CS 898O : Machine Learning
butest
 
Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)
butest
 
компьютерным наукам
компьютерным наукамкомпьютерным наукам
компьютерным наукам
butest
 
MACHINE LEARNING METHODS FOR THE
MACHINE LEARNING METHODS FOR THEMACHINE LEARNING METHODS FOR THE
MACHINE LEARNING METHODS FOR THE
butest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
butest
 
msword
mswordmsword
msword
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
butest
 
Sample Capstone Projects from 2005
Sample Capstone Projects from 2005Sample Capstone Projects from 2005
Sample Capstone Projects from 2005
butest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
butest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
butest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
butest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
butest
 

Destacado (16)

CS 898O : Machine Learning
CS 898O : Machine LearningCS 898O : Machine Learning
CS 898O : Machine Learning
 
Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)
 
компьютерным наукам
компьютерным наукамкомпьютерным наукам
компьютерным наукам
 
MACHINE LEARNING METHODS FOR THE
MACHINE LEARNING METHODS FOR THEMACHINE LEARNING METHODS FOR THE
MACHINE LEARNING METHODS FOR THE
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
Sans contenu vous êtes nus
Sans contenu vous êtes nusSans contenu vous êtes nus
Sans contenu vous êtes nus
 
PPT
PPTPPT
PPT
 
msword
mswordmsword
msword
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
Sample Capstone Projects from 2005
Sample Capstone Projects from 2005Sample Capstone Projects from 2005
Sample Capstone Projects from 2005
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 

Similar a BALANCING BOARD MACHINES

BALANCING BOARD MACHINES
BALANCING BOARD MACHINESBALANCING BOARD MACHINES
BALANCING BOARD MACHINES
butest
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Quantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical searchQuantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical search
IJRES Journal
 
Quantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical searchQuantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical search
irjes
 

Similar a BALANCING BOARD MACHINES (20)

BALANCING BOARD MACHINES
BALANCING BOARD MACHINESBALANCING BOARD MACHINES
BALANCING BOARD MACHINES
 
Newton cotes integration method
Newton cotes integration  methodNewton cotes integration  method
Newton cotes integration method
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
AI Lesson 29
AI Lesson 29AI Lesson 29
AI Lesson 29
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Designing a pencil beam pattern with low sidelobes
Designing a pencil beam pattern with low sidelobesDesigning a pencil beam pattern with low sidelobes
Designing a pencil beam pattern with low sidelobes
 
Distortion Correction Scheme for Multiresolution Camera Images
Distortion Correction Scheme for Multiresolution Camera ImagesDistortion Correction Scheme for Multiresolution Camera Images
Distortion Correction Scheme for Multiresolution Camera Images
 
m3 (2).pdf
m3 (2).pdfm3 (2).pdf
m3 (2).pdf
 
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
A Fuzzy Interactive BI-objective Model for SVM to Identify the Best Compromis...
 
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
A FUZZY INTERACTIVE BI-OBJECTIVE MODEL FOR SVM TO IDENTIFY THE BEST COMPROMIS...
 
An Artificial Neuron Implemented on an Actual Quantum Processor
An Artificial Neuron Implemented on an Actual Quantum ProcessorAn Artificial Neuron Implemented on an Actual Quantum Processor
An Artificial Neuron Implemented on an Actual Quantum Processor
 
Quantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical searchQuantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical search
 
Quantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical searchQuantum ant colony optimization algorithm based onBloch spherical search
Quantum ant colony optimization algorithm based onBloch spherical search
 

Más de butest

MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
butest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
butest
 
Facebook
Facebook Facebook
Facebook
butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
butest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
butest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
butest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
butest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
butest
 
Download
DownloadDownload
Download
butest
 
resume.doc
resume.docresume.doc
resume.doc
butest
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.doc
butest
 
Resume
ResumeResume
Resume
butest
 
Web Design Contract
Web Design ContractWeb Design Contract
Web Design Contract
butest
 
Download a Web Design Proposal Form
Download a Web Design Proposal FormDownload a Web Design Proposal Form
Download a Web Design Proposal Form
butest
 
Word version.doc.doc
Word version.doc.docWord version.doc.doc
Word version.doc.doc
butest
 
Web Design
Web DesignWeb Design
Web Design
butest
 
Teacher Guide
Teacher GuideTeacher Guide
Teacher Guide
butest
 

Más de butest (20)

MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 
Download.doc.doc
Download.doc.docDownload.doc.doc
Download.doc.doc
 
Resume
ResumeResume
Resume
 
Web Design Contract
Web Design ContractWeb Design Contract
Web Design Contract
 
Download a Web Design Proposal Form
Download a Web Design Proposal FormDownload a Web Design Proposal Form
Download a Web Design Proposal Form
 
Word version.doc.doc
Word version.doc.docWord version.doc.doc
Word version.doc.doc
 
Web Design
Web DesignWeb Design
Web Design
 
Teacher Guide
Teacher GuideTeacher Guide
Teacher Guide
 

BALANCING BOARD MACHINES

  • 1. BALANCING BOARD MACHINES Frederic Maire, School of Software Engineering and Data Communication, Queensland University of Technology, Box 2434, Brisbane, Qld 4001, Australia f.maire@qut.edu.au Abstract The support vector machine solution corresponds to the center of the largest sphere inscribed in version space. Alternative approaches like Bayesian Point Machine (BPM) and Analytic Center Machine have suggested that the generalization performance can be further enhanced by considering other possible centers of version space like the centroid (center of mass). We present an algorithm to compute exactly the centroid of higher dimensional polyhedra, then derive approximation algorithms to build a new learning machine whose performance is comparable to BPM. Key Words Kernel machines, Bayesian Point, Centroid. 1. Introduction Kernel classifiers are non-linear decision functions for binary classification. In the Kernel Machine framework (Muller & Mika & Ratch & Tsuda & Scholkopf, [1]; Scholkopft & Smola, [2]), a feature mapping )(xx φ from an input space to a feature space is given (generally, implicitly via a kernel function), as well as a training set of input vectors { }m xx ,,1  with the corresponding class labels { }myy ,,1  where { }1,1 +−∈iy . The learning problem is formulated as a search problem for a linear classifier hypothesis (a weight vector w ) belonging to a subset of the feature space called version space; { }0)(xw,],,1[| i >∈∀ φiymiw . In other words, version space is the set of weight vectors w that are consistent with the training set. Because only the direction of w matters for classification purpose, without loss of generality, we can restrict the search for w to the unit sphere in feature space. The training algorithm of a Support Vector Machine (SVM) returns the weight vector w that has the smallest maximum angle between w and the )( ii xy φ ’s. The Kernel trick is that for certain feature spaces and mappings φ, there exist easily computable kernel functions k defined on the input space such that )(),(),( yxyxk φφ= . With such a kernel function k , the computation of inner products )(),( yx φφ does not require the explicit knowledge of φ. In fact for a given kernel function k , there may exist many different mappings φ. Geometrically, each training example ( )ii yx , defines a half-space in feature space through the constraint 0)(xw, i >φiy on w . It is easy to see that version space is a polyhedral cone of feature space. Figure 1 shows a bird-eye view (slice of the polyhedral cone) of version space. Figure 1: An elongated version space. The SVM point is the centre of the sphere. The SVM solution point SVMw is the centre of the largest sphere whose centre is a unit vector and is contained in the polyhedral cone. Bayes Point Machines (BPM) are a well-founded improvement which approximates the Bayes-optimal decision by the centroid (also known as the centre of mass or barycentre) of version space. It happens that the Bayes point is very close to the centroid of version space in high dimensional spaces. The Bayes point achieves better generalization performance in comparison to SVMs (Opper & Haussler, [3]; Shawe-Taylor & Williamson, [4]; Graepel & Herbrich & Campbell, [5]; Watkin, [6]).
  • 2. An intuitive way to see why the centroid is a good choice is to view version space as a committee of experts who all agree on the training set. A new input vector x corresponds to a hyperplane in feature space that may cut version space in two parts. In the example of Figure 1, the experts on the right of the hyperplane normal to ( )xφ classify x positively, whereas the experts on the left classify x negatively. It is reasonable to use the opinion of the majority of the experts that successfully classified the training set to predict the class label of x . The expert that agrees the most with the majority vote on new inputs is precisely the Bayesian point. In a standard committee machine, for each new input we seek the opinions of a finite number of experts’ then take a majority vote, whereas in a BPM, the expert that most often agrees with the majority vote of the infinite committee (version space) is delegated the task of classifying the new input. Following Rujan [7], Herbrich and Graepel [8] introduced two algorithms to stochastically approximate the centroid of version space: a billiard sampling algorithm and a sampling algorithm based on the well known perceptron algorithm. In this paper, we present an algorithm to compute exactly the centroid of a polyhedron in a high dimensional space. From this exact algorithm, we derive an algorithm to approximate a centroid position in a polyhedral cone. We show empirically that the corresponding machine presents better generalization capability than SVMs on a number a benchmark data sets. In section 2, we introduce an algorithm to compute exactly the centroid of higher dimensional polyhedra. In section 3, we show how to use this algorithm to approximate the centroid of version space. In section 4, some implementation issues are considered and some experimental results are presented. 2. Exact Computation of the Centroid of a Higher Dimensional Polyhedron A polyhedron P is the intersection of a finite number of half-spaces. It is best represented by a system of non redundant linear inequalities { }bAxxP ≤= | . Recall that the 1-volume is the length, the 2-volume is the surface and the 3-volume is the every-day-life volume. The algorithm that we introduce for computing the centroid of an n -dimensional polyhedron is an extension of the work by Lasserre [10] who showed that the n - dimensional volume ),,( bAnV of a polyhedron { }bAxxP ≤= | is related to the )1( −n -dimensional volumes of its facets and the row vectors of its matrix A by the following formula; ),,1()/()/1(),,( bAnVabnbAnV i i ii −×= ∑ where ia denotes the it h row of A and ),,1( bAnVi − denotes )1( −n -dimensional volume of the ith facet. The computation of the centroid and the )1( −n -volume of a facet is done by variable elimination. Geometrically, this amounts to projecting the facet onto an axis parallel hyperplane, then computing the volume and the centroid of this projection recursively in a lower dimensional space. From the volume and centroid of the projected facet, we can derive the centroid and volume of the original facet. The formulae below are obtained by considering the n - fold integral defining the n -dimensional volume and decomposing the polyhedron into cones. The centroid of a polyhedron can be computed recursively in the following manner; • Compute recursively the centroids iFG and the )1( −n -volumes iFV of each facet (face of dimension 1−n ) iF of P . Each facet iF corresponds to the intersection of P with the hyperplane defined by the th i row of the system bAx ≤ . • Compute ∑ ∑ ×= i F j F F E i j i G V V G , the centroid of the envelope of P (the union of the facets iF ). • Compute the centroids iCG and the n -volumes iCV of the cones ),cone( iEi FGC = rooted at EG . If ih is the distance from EG to the hyperplane containing iF , then ii F i C V n h V ×= and ii FECE GG n n GG × + = 1 . • Compute G the centroid of P as the weighted sum ∑ ∑ ×= i C j C C i j i G V V G .
  • 3. It is useful to observe that the computation of the volume and the centroid of a ( )1−n -dimensional polyhedron in a n -dimensional space is identical to the computation of the volume and the centroid of a facet of a n - dimensional polyhedron. For further details, see the Matlab source code at http://www.fit.qut.edu.au/~maire/G. 3 Balancing Board Machines 3.1 A Mechanical Point of View The point of contact of a board posed in equilibrium on a sphere (assumed to be the only source of gravity) is the centroid of the board. This observation is the basis of our “balancing board algorithm”. In the rest of this paper, the term “board” will refer to the intersection of the polyhedral cone of version space with a hyperplane normal to a unit vector w of version space. This definition implies that if the polyhedral cone is n -dimensional then a board will be a ( )1−n -dimensional polyhedron tangent to the unit sphere. In the algorithm we propose, the approximation w of the centroid direction of the cone is refined by computing the centroid of the board normal to w , and then rotating w towards the centroid of the board (stopping at a local minimum of the volume of the board in this line search). Figure 2 illustrates the balancing process of a board. Notice that Figure 2 is simply an illustration as in dimension 2 the line search would succeed in just one line-search iteration! Figure 2: Top left, initial board. Top right, after one iteration. Bottom left, after two iterations. Bottom right after three iterations. 3.2 Exploring the Polyhedral Cone Statistical learning theory (Scholkopft & Smola, [2]) tells us that the Bayes point w belongs to the vector subspace V generated by the family of vectors ( ) ( ){ }m xx φφ ,,1  , that is w is of the form ( )∑= j j j xw φα . Once we know an orthonormal basis of V (the orthonormality is with respect to the inner product in feature space corresponding to the kernel function in the input space), we can express the polyhedral cone inequalities with respect to this orthonormal basis. Then we can apply the formulae of section 2 to compute the centroid of any polyhedron expressed in this orthonormal basis. The kernel PCA basis is an orthonormal basis B of V . Its basis vectors are the eigenvectors of the symmetric matrix ( ){ } jiji xxkK , ,= . By expressing the polyhedral cone defined by the training examples in B , we will be able to approximate a centroid direction with the board balancing algorithm sketched in section 3.1 and detailed below. The complexity of the algorithm of section 2 to compute exactly the centroid is unfortunately exponential. The computational cost of the exact calculation of the centroid is too high even for medium size data sets. However, the recursive formulae allow us to derive an approximation of the volume and the centroid of a polyhedron once we have approximations for the volumes and the centroids of its facets. Because the balancing board algorithm requires several board centroid estimations, it is desirable to recycle intermediate results as much as possible to achieve a significant reduction in computation time. Because the intersection of a hyperplane and a spheric cone is an ellipsoid, we estimate the volume and the centroid of the intersection of the board and a facet of the polyhedral cone (this intersection is (n-2)-dimensional) with the volume and the centroid of the intersection of the board and the largest spheric cone contained in the facet (this spheric cone is (n-1)-dimensional). The computation of these largest spheric cones is done only once. The centre of the ellipsoid and its quadratic matrix is easily derived from the centre and radius of the spheric cone. These derivations are explained in the next sub-sections.
  • 4. To simplify the computations, we have restricted our study to non-singular kernel matrices (like those obtained from Gaussian kernels). 3.2.1 Change of Basis Let Bw be the coordinates of w with respect to ( ) ( ){ }m xxB φφ ,,1 = . Recall that the Kernel PCA basis is made of the eigenvectors of K . Let Uw be the coordinates of w with respect to the Kernel PCA orthonormal basis { }m uu ,,1  . We have UB Uww = . Let IKM def λ+= , where λ is a non-negative regularization parameter as in (Herbrich et al, [9]). We are looking for Bw such that ( ) 0diag ≤− BMwy with 1, =ww and w near the centroid direction of the polyhedral cone. As we have ( ) U T U wwww =, , in practice, we look for the centroid direction of the cone ( ) ( )   = ≤− 1 0diag U T U U ww MUwy (2) 3.2.2 Computing the Spheric Centre of a Polyhedral Cone Derived from a Non Singular Mercer Kernel Matrix Let 0≤Ax be the non-empty polyhedral cone derived from the kernel matrix. The matrix A is square ( nm = ). Without loss of generality, we assume that its rows are normalized. That is each row is a vector of norm 1. Recall that the spheric centre sw of the cone (direction of the centre of the largest sphere contained in the polyhedral cone and whose centre is at distance one from zero) corresponds to the SVM solution. Because A is square and non-singular, each facet of the polyhedral cone touches the largest sphere. If each facet is moved by a distance of one in the direction of its normal vector, the new cone obtained is a translation of the original cone in the direction of sw . That is sw can be obtained by solving 1  −=Ax . Once the direction swu = of the spheric centre of the polyhedral cone is determined, the radius r of the largest sphere centered at u can be computed. Here the radius of the spheric cone is defined as the radius of the largest (n-1)-sphere contained in the intersection of the cone and the hyperplane 1=xuT . We use this definition to avoid geodesics. 3.2.3 Computation of r We write :),(kA to denote the kth row of matrix A . For each i , let ( )( )uiAi :,acos2/ −−=πα . The scalar r is the minimum over all ( )itan α . If we are interested in the attributes of the cone contained in the facet ( ) 0:, =xkA , we simply solve the system ( ) ( )   = −=+− 0:, 1::],1,1:1[ xkA xkkA 3.2.4 Spheric Cone Equation Given the characteristic attributes u and r of a spheric cone, we can derive a simple equation for the cone. Let ( )uxuz T = and zxy −= . The cone equation is zzryy TT 2 = . An alternative equation (derived from Pythagoras theorem) is ( )( )22 1 xurxx TT += . Our estimation of the volume and centroid of the board requires the estimation of the volume and centroid of the intersection of a cone and two hyperplanes (namely a facet and the hyperplane containing the board). 3.2.5 Intersection of a Spheric Cone and Two Hyperplanes Consider the cone ( )( )22 1 xurxx TT += contained in the kth facet (that is ( ) 0:, =ukA ). Let’s compute the ellipsoid defined by the intersection of this cone and the hyperplane 1=xwT ( w is normal to the board). Let [ ] ( )               == − :, null,, 21 kA w qqQ T n . Let h be the intersection of the ray defined by u and the hyperplane 1=xwT . Let us make the change of variables zQhx += . One can easily check that ( ) ( )( ) 0:,and1,2 =+=+∈∀ − zQhkAzQhwRz Tn We derive now the equation of the ellipsoid with respect to z . From ( )( )22 1 xurxx TT += , we obtain ( ) ( ) ( )( )22 1 zQuhurzQhzQh TTT ++=++
  • 5. After developing the expression, we get ( )( ) ( )( ) ( )( )zQhuurzQuuQzr hurzQQzzQhhh TTTTT TTTTT 211 12 22 22 +++ ++=++ After regrouping, we have ( )( ) ( )( )( ) ( )( ) 01 12 1 22 2 2 =+− ++− ++− hurhh zQhuurh zQuurIQz TT TTT T n TT From this expression we can derive an expression of the form ( ) ( ) bczDcz T =−− that will tell us the (n-2)- volume of the ellipsoid and its centre. It is easy to check that cQh + is the centre of the ellipsoid in n R . In the next subsection, we show how to compute the volume of the ellipsoid. 3.2.6 Volume of an Ellipsoid ( ) ( ) bcxDcx T =−− Without loss of generality, we assume that 1and0 == bc . The matrix D is symmetric non- negative, therefore there exists a decomposition T PPD ∆= where P is orthogonal and ∆ non- negative and diagonal. Let xPy T = , the equation of the ellipsoid becomes ∑ = i ii y 12 δ . Let iii yz δ= . We can check that if y is on the ellipsoid then z is on the unit sphere and reciprocally. That is the ellipsoid is obtained from the unit sphere by the linear transformation of matrix M , where                   = n M δ δ 1 00 0 0 00 1 1     Recall that if ( ) Mxxf = is a linear transformation and S is a subset of the vector space, then we have, ( ) ( )( ) ( )SSf volMdetabs)(vol ×= . The volume of the ellipsoid is therefore ∏= n i i1 1 δ times the n-volume of the n-sphere. For completeness, let us mention that the volume of a n- sphere of radius r is ( )nn r nn 2 1 2 1 2 Γ × π , and the volume of the n-rectangle containing the ellipsoid is ∏= × n i i n 1 1 2 δ . 3.2.7 Distance from w to a Facet To compute the ( )1−n -volume of the intersection of the board 1=xwT and the polyhedral cone P, we need to find for each facet ( ) 0:, ≤xkA the point x in the plane generated by w and ( )T kA :, that belongs to this intersection. That is the orthogonal projection of w on the hinge defined as the intersection of the board and the kth cone facet. The point ( ) wkAx T βα += :, must satisfy ( )   = = 0:, 1 xkA xwT . Therefore ( )wkA :,where 0 1 =    =+ =+ γ γβα βγα . That is, ( )             ×= − 0 1 1 1 ]:,[ 1 γ γ wkAx T . 4. Implementation Issues and Experimental Results We have implemented the exact computation of the centroid and the volume in Matlab. A direct recursive implementation of Lasserre formula would be very inefficient as faces of dimension k share faces of dimension 1−k . Our implementation caches the volumes and centroids of the lower dimensional faces in a hash-table. Our algorithm has been validated by comparing the values returned with a Monte-Carlo method. As Lasserre’s formula is valid only if the polyhedron is represented as a system of non-redundant linear inequalities. Redundancy must be detected and eliminated by using a linear optimization. The kernel matrix of a Gaussian kernel can only be singular when identical input vectors occur more than once the training set. We remove repeated occurrences of
  • 6. the same input vector and assign the most common label for this input vector to the occurrence that we leave in the training set. The table which follows summarises generalization performance (percentage of correct predictions on test sets) of the Balancing Board Machine (BBM) on 6 standard benchmarking data sets from the UCI Repository, comparing results for illustrative purposes with equivalent hard margin support vector machines. In each case the data was randomly partitioned into 20 training and test sets in the ratio 60%:40%. Data set SVM BBM heart disease 58.36 58.40 thyroid 94.34 95.23 diabetes 66.89 67.68 waveform 83.50 83.50 sonar 85.06 85.78 ionosphere 86.79 86.86 The results obtained with a BBM are comparable to those obtained with a BPM, but the improvement is not always as dramatic as those reported in (Herbrich et al., [9]). We observed that the improvement was generally better for smaller data sets. We suspect that this is due to the fact the volumes considered become very small in high dimensional spaces. In fact, on a PC, unit spheres “vanish” when their dimension exceed 340. The volume of a unit sphere of dimension 340 is 4.5 10-223 . This is why we consider the logarithm of the volume in our programs. 5. Conclusion The exact computation algorithm can be useful for benchmarking to people developing new centroid approximation algorithms. We do not claim that our BBM approach is superior to any other given that the computational cost is in the order of m times the cost of a SVM computation (where m is the number of training examples). Replacing the ellipsoids with a more accurate estimation would probably give better results, but deriving the volume and the centroid of the intersection of a facet and a board from the volume and the centroid of the intersection of the same facet with another board seems to be a hard problem. The computation of the SVM point presented in section 3.2.2 provides an efficient learning algorithm for Gaussian kernels. 4. Acknowledgement I would like to thank Professor Tom Downs and Professor Peter Bartlett for their valuable comments on a previous version of the BBM algorithm. This work was partially supported by an ATN grant. References [1] Muller, K., Mika, S., Ratch, G., Tsuda, K., and Scholkopf, B. An Introduction To Kernel-Based Learning Algorithms. IEEE Trans. on NN, vol 12, no 2, 2001, pp 181-201. [2] Scholkopft, B., Smola, A., Learning with Kernels, http://www.kernel-machines.org/ [3] M. Opper and D. Haussler, Generalization performance of Bayes optimal classification algorithm for learning a perceptron, Phys. Rev. Lett., vol. 66, p. 2677, 1991. [4] J. Shawe-Taylor and R. C. Williamson, A PAC analysis of a Bayesian estimator, Royal Holloway, Univ. London, Tech. Rep. NC2-TR-1997-013, 1997. [5] T. Graepel, R. Herbrich, and C. Campbell, Bayes point machines: Estimating the bayes point in kernel space, in Proc.f IJCAI Workshop Support Vector Machines, 1999, pp. 23-27. [6] T. Watkin, Optimal learning with a neural network, Europhys. Lett., vol. 21, pp. 871-877, 1993. [7] P. Ruján, Playing billiard in version space, Neural Comput., vol. 9, pp. 197-238, 1996. [8] R. Herbrich and T. Graepel, Large scale Bayes point machines, Advances in Neural Information System Processing 13, 2001. [9] R. Herbrich, T. Graepel, and C. Bayes Point Machines, Journal of Machine Learning Research, 1 (2001) 245--279. [10] Lasserre, J., An analytical Expression and an Algorithm for the volume of a Convex Polyhedron in Rn , Journal of Optimization Theory and Applications, Vol 39, No 3, 1983. Schrijver, A. Theory of Linear and Integer Programming, Wiley-Interscience Publication (1990). Theodore B. Trafalis, Alexander M. Malyscheff: An Analytic Center Machine. 203-223, Machine Learning, Volume 46, 2002
  • 7. the same input vector and assign the most common label for this input vector to the occurrence that we leave in the training set. The table which follows summarises generalization performance (percentage of correct predictions on test sets) of the Balancing Board Machine (BBM) on 6 standard benchmarking data sets from the UCI Repository, comparing results for illustrative purposes with equivalent hard margin support vector machines. In each case the data was randomly partitioned into 20 training and test sets in the ratio 60%:40%. Data set SVM BBM heart disease 58.36 58.40 thyroid 94.34 95.23 diabetes 66.89 67.68 waveform 83.50 83.50 sonar 85.06 85.78 ionosphere 86.79 86.86 The results obtained with a BBM are comparable to those obtained with a BPM, but the improvement is not always as dramatic as those reported in (Herbrich et al., [9]). We observed that the improvement was generally better for smaller data sets. We suspect that this is due to the fact the volumes considered become very small in high dimensional spaces. In fact, on a PC, unit spheres “vanish” when their dimension exceed 340. The volume of a unit sphere of dimension 340 is 4.5 10-223 . This is why we consider the logarithm of the volume in our programs. 5. Conclusion The exact computation algorithm can be useful for benchmarking to people developing new centroid approximation algorithms. We do not claim that our BBM approach is superior to any other given that the computational cost is in the order of m times the cost of a SVM computation (where m is the number of training examples). Replacing the ellipsoids with a more accurate estimation would probably give better results, but deriving the volume and the centroid of the intersection of a facet and a board from the volume and the centroid of the intersection of the same facet with another board seems to be a hard problem. The computation of the SVM point presented in section 3.2.2 provides an efficient learning algorithm for Gaussian kernels. 4. Acknowledgement I would like to thank Professor Tom Downs and Professor Peter Bartlett for their valuable comments on a previous version of the BBM algorithm. This work was partially supported by an ATN grant. References [1] Muller, K., Mika, S., Ratch, G., Tsuda, K., and Scholkopf, B. An Introduction To Kernel-Based Learning Algorithms. IEEE Trans. on NN, vol 12, no 2, 2001, pp 181-201. [2] Scholkopft, B., Smola, A., Learning with Kernels, http://www.kernel-machines.org/ [3] M. Opper and D. Haussler, Generalization performance of Bayes optimal classification algorithm for learning a perceptron, Phys. Rev. Lett., vol. 66, p. 2677, 1991. [4] J. Shawe-Taylor and R. C. Williamson, A PAC analysis of a Bayesian estimator, Royal Holloway, Univ. London, Tech. Rep. NC2-TR-1997-013, 1997. [5] T. Graepel, R. Herbrich, and C. Campbell, Bayes point machines: Estimating the bayes point in kernel space, in Proc.f IJCAI Workshop Support Vector Machines, 1999, pp. 23-27. [6] T. Watkin, Optimal learning with a neural network, Europhys. Lett., vol. 21, pp. 871-877, 1993. [7] P. Ruján, Playing billiard in version space, Neural Comput., vol. 9, pp. 197-238, 1996. [8] R. Herbrich and T. Graepel, Large scale Bayes point machines, Advances in Neural Information System Processing 13, 2001. [9] R. Herbrich, T. Graepel, and C. Bayes Point Machines, Journal of Machine Learning Research, 1 (2001) 245--279. [10] Lasserre, J., An analytical Expression and an Algorithm for the volume of a Convex Polyhedron in Rn , Journal of Optimization Theory and Applications, Vol 39, No 3, 1983. Schrijver, A. Theory of Linear and Integer Programming, Wiley-Interscience Publication (1990). Theodore B. Trafalis, Alexander M. Malyscheff: An Analytic Center Machine. 203-223, Machine Learning, Volume 46, 2002