social pharmacy d-pharm 1st year by Pragati K. Mahajan
Classification Theory
1. 4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009
Classification Theory
Modelling of Kernel Machine by
Infinite and Semi-Infinite Programming
Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber *
Institute of Applied Mathematics, METU, Ankara, Turkey
* Faculty of Economics, Management Science and Law, University of Siegen, Germany
Center for Research on Optimization and Control, University of Aveiro, Portugal
1
August 7, 2009
2. Motivation Prediction of Cleavage Sites
signal part mature part
γ
2
August 7, 2009
3. Logistic Regression
P(Y = 1 X = xl )
log = β0 + β1 ⋅ xl1 + β2 ⋅ xl 2 + K + β p ⋅ xlp
P(Y = 0 X = x )
l
(l = 1, 2,..., N )
3
August 7, 2009
4. Linear Classifiers
Maximum margin classifier:
γ i := yi ⋅ (< w, xi > +b)
Note: γ i > 0 implies correct classification.
γ
yk ⋅ (< w, xk > +b) = 1
y j ⋅ (< w, x j > +b) = 1
4
August 7, 2009
5. Linear Classifiers
2
• The geometric margin: γ=
w 2
2 2
max min w
w 2
2
2
Convex min w
w ,b
2
Problem
subject to yi ⋅ ( w, xi + b) ≥ 1 (i = 1, 2,..., l)
5
August 7, 2009
6. Linear Classifiers
Dual Problem:
l
1 l
max ∑ α i − ∑ yi y jα iα j xi , x j
i =1 2 i , j =1
l
subject to ∑ yα
i =1
i i = 0,
α i ≥ 0 (i = 1, 2,..., l).
6
August 7, 2009
7. Linear Classifiers
Dual Problem:
l
1 l
max ∑ α i − ∑ yi y jα iα j κ ( xi , x j )
i =1 2 i , j =1
l kernel function
subject to ∑ yα
i =1
i i = 0,
α i ≥ 0 (i = 1, 2,..., l).
7
August 7, 2009
8. Linear Classifiers
Soft Margin Classifier:
• Introduce slack variables to allow the margin constraints to be
violated
subject to yi ⋅ ( w, x i + b) ≥ 1 − ξi ,
ξi ≥ 0 (i = 1, 2,..., l)
l
w + C ∑ ξi2
2
min
ξ , w ,b 2
i =1
subject to yi ⋅ ( w, xi + b) ≥ 1 − ξi ,
ξi ≥ 0 (i = 1, 2,..., l)
8
August 7, 2009
9. Linear Classifiers
• Projection of the data into a higher dimensional feature space.
• Mapping the input space X into a new space F :
x = ( x1 ,..., xn ) a φ ( x) = (φ1 ( x),..., φN ( x))
φ (x)
φ (x)
φ (0) φ (x) φ (x)
φ (0)
φ (x)
φ (0)
φ (0) φ (0)
φ (x)
9
August 7, 2009
10. Nonlinear Classifiers
N
set of hypotheses f ( x) =∑ wiφi ( x) + b,
i =1
l
dual representation f ( x) =∑ α i yi φ ( xi ), φ ( x) + b.
i =1
kernel function
Ex.: polynomial kernels κ ( x, z ) = (1 + xT z )k
sigmoid Kernel κ ( x, z ) = tanh(axT z + b)
κ ( x, z ) = exp(− x − z / σ 2 )
2
Gaussian (RBF) kernel 2
10
August 7, 2009
11. (In-) Finite Kernel Learning
• Based on the motivation of multiple kernel learning (MKL):
K
( ) (
κ xi , x j = ∑ β k κ k xi , x j )
k =1
kernel functions κ l (⋅, ⋅) :
βl ≥ 0 ( l = 1,K, K ) , ∑ βk = 1
K
k =1
• Semi-infinite LP formulation:
(SILP MKL)
max θ
θ ,β
(θ ∈R, β ∈RK )
∑
K
such that 0 ≤ β, β
k =1 k
= 1,
∑k =1βk Sk (α ) ≥ θ ∀α ∈ Rl with 0 ≤ α ≤ C1 and ∑i =1αi yi = 0.
K l
Sk (α ) :=
1 l
2
( )
∑ i, j =1αiα j yi y jκ k xi , x j − ∑ i =1αi
l
11
August 7, 2009
12. Infinite Kernel Learning Infinite Programming
2
ex.: −ω xi − x j
*
κ ( xi , x j , ω ) := ω exp 2 + (1 − ω )(1 + xiT x j ) d
H (ω ) := κ ( xi , x j , ω ) homotopy
2
−ω * xi − x j
H (0) = (1 + xi x j ) d
T
H (1) = exp 2
κ β ( xi , x j ) := ∫ κ ( xi , x j , ω )d β (ω )
Ω
Infinite Programming
12
August 7, 2009
13. Infinite Kernel Learning Infinite Programming
• Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
we get the following general problem formulation:
κ β ( xi , x j ) = ∫ κ ( xi , x j , ω )d β (ω ) Ω = [0,1]
Ω
13
August 7, 2009
14. Infinite Kernel Learning Infinite Programming
• Introducing Riemann-Stieltjes integrals to the problem (SILP-MKL),
we get the following general problem formulation:
max θ
θ ,β
(θ ∈ R, β : [0,1] → R : monotonically increasing )
(IP)
1
subject to ∫0 d β (ω ) = 1,
1
S (ω , α ) − ∑ i =1αi d β (ω ) ≥ θ ∀α ∈ R l with 0 ≤ α ≤ C , ∑ i =1αi yi = 0.
l l
∫Ω 2
( )
1 l l
S (ω , α ) := ∑ i , j =1α iα j yi y jκ xi , x j , ω
A := α ∈ R 0 ≤ α ≤ C1 and ∑ α i yi =0
l
2 i =1
1
T (ω , α ) := S (ω , α ) − ∑ α i
l 14
2 i =1 August 7, 2009
15. Infinite Kernel Learning Infinite Programming
max θ (θ ∈ R, β : a positive measure on Ω )
(IP) θ ,β
such that θ − ∫ T (ω , α )d β (ω ) ≤ 0 ∀α ∈ A, ∫Ω d β (ω ) = 1.
Ω
infinite programming
dual of (IP):
min σ (σ ∈ R , ρ : a positive measure on A )
σ ,ρ
(DIP)
such that σ -∫ T (ω , α )d ρ (α ) ≥ 0 ∀ω ∈ Ω, ∫A d ρ (α ) = 1.
A
• Duality Conditions: Let (θ , β ) and (σ , ρ ) be feasible for their respective problems, and
complementary slack, so
β has measure only where σ = ∫A T (ω , α )d ρ and
ρ has measure only where θ = ∫ T (ω , α )d β .
Ω
Then, both solutions are optimal for their respective problems.
15
August 7, 2009
16. Infinite Kernel Learning Infinite Programming
• The interesting theoretical problem here is to find conditions
which ensure that solutions are point masses
(i.e., the original monotonic β is a step function).
• Because of this and in view of the compactness of the feasible (index) sets at the
lower levels, A and Ω , we are interested in the nondegeneracy of the local minima
of the lower level problem to get finitely many local minimizers of
g ( (σ , ρ ) , ω ) := σ − ∫ T (ω , α ) d ρ (α ).
A
• Lower Level Problem: For a given parameter (σ , ρ ), we consider
(LLP)
min g ( (σ , ρ ) , ω ) subject to ω ∈ Ω .
ω
16
August 7, 2009
17. Infinite Kernel Learning Infinite Programming
• “reduction ansatz” and
• Implicit Function Theorem
• parametrical measures
• “finite optimization”
17
August 7, 2009
18. Infinite Kernel Learning Infinite Programming
• “reduction ansatz” and
• Implicit Function Theorem
• parametrical measures 1 −(ω − µ )2
e.g., f (ω ;( µ , σ )) =
2
exp
σ 2π 2σ 2
λ exp(−λω), ω ≥ 0
f (ω ; λ) =
0, ω<0
H (ω − a) − H (ω − b)
f (ω ;(a, b)) =
b−a
ωα −1 (1 − ω ) β −1
f (ω;(α , β )) = 1 α −1 β −1
∫0
u (1 − u ) du
• “finite optimization”
18
August 7, 2009
19. Infinite Kernel Learning Reduction Ansatz
• “reduction ansatz” and
• Implicit Function Theorem
g ( x, ⋅)
%
• parametrical measures
g ( x ,.)
Ω
g ( x, y ) ≥ 0 ∀y ∈ I yj yj
% yp
⇔ min g ( x, y ) ≥ 0
y∈I x a y j ( x) implicit function
19
August 7, 2009
20. Infinite Kernel Learning Reduction Ansatz
based on the reduction ansatz :
min f ( x)
subject to g j ( x) := g ( x, y j ( x)) ≥ 0 ( j ∈ J := {1, 2, K, p})
g ((σ , ρ ), ⋅)
g ((σ , ρ ), ⋅)
• (σ , ρ )
•
ω ω (σ , ρ )
topology
ω = ω (σ , ρ )
% 20
August 7, 2009
21. Infinite Kernel Learning Regularization
regularization
t t
d d2
min − θ + sup µ ∫ d β (ω ) ∫ d β (ω )
θ ,β t∈[0,1] dt 0
2
dt 0
subject to the constraints
0 = t0 < t1 < K < tι = 1
tν +1 tν
tν ∫ d β (ω ) − ∫ d β (ω ) tν +1
d 1
∫ d β (ω ) ≈ 0 0 = ∫ d β (ω )
dt tν +1 − tν tν +1 − tν
0 tν
tν + 2 tν +1
1 1
∫ d β (ω ) − ∫ d β (ω )
2 tν tν + 2 − tν +1 tν +1 − tν
d tν +1 tν
dt 2 0
∫ d β (ω ) ≈ tν +1 − tν
21
August 7, 2009
22. Infinite Kernel Learning Topology
Radon measure: measure on the σ -algebra of Borel sets of E that is
locally finite and inner regular.
(E,d): metric space inner regularity
Η (E) : set of Radon measures on E
neighbourhood of measure ρ :
µ (Kν )
Bρ (ε ) := µ ∈ Η ( E ) ∫ fd µ − ∫ fd ρ < ε
f
A A
dual space ( Η ( E ))′ of continuous bounded functions, Kν ⊂ E : compact set
f ∈ ( Η ( E ))′
22
August 7, 2009
23. Infinite Kernel Learning Topology
Def.: Basis of neighbourhood of a measure ρ ( f1,..., fn ∈(Η(E))′; ε > 0) :
{µ ∈ Η (E) ∫E fi d ρ − ∫E fi d µ < ε }
(i = 1, 2,..., n) .
Def.: Prokhorov metric:
d0 ( µ , ρ ) := inf {ε ≥ 0 | µ ( A) ≤ ρ ( Aε ) + ε and ρ ( A) ≤ µ ( Aε ) + ε (A : closed)} ,
ε
where Aε := { x ∈ E | d ( x, A) < ε }.
Open δ -neighbourhood of a measure ρ :
Bδ ( ρ ) := {µ ∈ Η ( E ) d0 ( ρ , µ ) < δ }.
23
August 7, 2009
25. References
Özöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B., Pattern analysis for the prediction of eukoryatic pro
peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157,
10 (May 2009) 2388-2394.
Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming,
Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings
1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4
(August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..
Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to
appear in the special issue of OMS (Optimization Software and Application) at the occasion of International
Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K.
(guest ed.).
Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM,
METU, submitted to JOGO (Journal of Global Optimization).
25
August 7, 2009