SlideShare una empresa de Scribd logo
1 de 47
Descargar para leer sin conexión
4.       (   )

     1
Menu




       2
SVM
SVM Support Vector Machine




                             4
Linear Binary Classifier
                                w


              x
  f (x) = w · x − b


   
    +1,   if f (x) ≥ 0,                      w·x=b
y=
    −1,   if f (x)  0.

                                                     5
Linear Binary Classifier
                                                                     w

D = {(x   (1)
                ,y   (1)
                           ), . . . , (x   (|D|)
                                                   ,y   (|D|)
                                                                )}




                                                                               w·x=b



                                                                                       6
Margin Maximization




                      7
Margin Maximization




                                      x+
x+
x∗   x+
                                  x∗

          |x+ − x∗ |.


                                           8
Margin Maximization

                                    w

        w · x+ − b = 1
                                                   x+

              1
|x+ − x∗ | =     w · (x+ − x∗ )               x∗
             |w|
              1                             w·x=b
           =
             |w|

                                  |w|2
                                                        9
SVM Hard Margin SVM


     y (w · x
      (i)       (i)
                      − b) ≥ 1.



      1
min.    |w|2
      2
 s.t. y (w · x − b) ≥ 1 ; ∀i.
       (i)    (i)

                                  10
SVM Hard Margin SVM




      1
min.    |w|2
      2
 s.t. y (w · x − b) ≥ 1 ; ∀i.
       (i)    (i)

                                11
Dual Problem

                 αi (≥ 0)
             1                                                 
 L(w, b, α) = |w|2 −            αi       y (i) (w · x(i) − b) − 1 .
             2             i



                                                   
∇w L = w −          (i)
                 αi y x   (i)
                                = 0. ∴ w =      ∗            (i)
                                                          αi y x . (i)

             i                                        i
∂L 
   =   αi y (i) = 0.
∂b   i
                                                                         12
Dual Problem

                                    2                 2
              
             1 ∗                      
                                        1               
                             (i) (i)            (i) (i) 
L(w , b, α) = w −
   ∗
                         αi y x  −         αi y x 
             2     i
                                       2 i             
                             
             +b   αi y (i) +      αi
                   i            i
                                 2
               1                  
                          (i) (i) 
            =−       αi y x  +          αi
               2 i               
                                        i
               1                             
            =−       αi αj y y x · x +
                            (i) (j) (i)   (j)
                                                αi .
               2 i,j                          i           13
Dual Problem




          1                             
max.    −         αi αj y y x · x +
                         (i) (j) (i) (j)
                                           αi
          2 i,j                          i
        
 s.t.      αi y = 0,
                (i)

         i
        αi ≥ 0 ; ∀i.

                                                14
Dual Problem




               15
Dual Problem



                     
          f (x) =               (i)
                              αi y x   (i)
                                             · x − b.
                        i
              (i)               (i)        (j)
          x              x            ·x
                     (i)
                    x ·x

αi = 0             x   (i)




                                                        16
SVM Soft Margin SVM




                      17
SVM Soft Margin SVM
                         ξi (≥ 0)
            y (w · x
             (i)       (i)
                             − b) ≥ 1 − ξi .



             1         
    min.       |w| + C
                  2
                         ξi
             2         i

     s.t.   y (w · x
              (i)       (i)
                              − b) ≥ 1 − ξi ; ∀i,
            ξi ≥ 0 ; ∀i.

C                                                   18
SVM Soft Margin SVM
               αi βi
                  1         
L(w, b, ξ, α, β) = |w|2 + C     ξi
                  2          i
                                                
                  −     αi y (w · x − b) − 1 + ξi −
                            (i)    (i)
                                                     βi ξi .
                   i                                 i
w   b                                     ξi
                    ∂L
                        = C − αi − βi
                    ∂ξi
                       C = αi + βi .
                                                               19
SVM Soft Margin SVM
               L
              1
        L=−                (i) (j)
                    αi αj y y        x   (i)
                                               ·x   (j)
             2 i,j
                                       
           +     αi (1 − ξi ) + C   ξi −   βi ξi
               i                          i               i
            1                                    
         =−       αi αj y (i) y (j) x(i) · x(j) +   αi .
            2 i,j                                 i

βi ξi               βi = C − αi ≥ 0
                                                              20
SVM Soft Margin SVM

          1                             
max.    −         αi αj y y x · x +
                         (i) (j) (i) (j)
                                           αi
          2 i,j                          i
        
 s.t.      αi y (i) = 0,
         i
        0 ≤ αi ≤ C ; ∀i.



                                                21
f (x)




        22
Functional Distance


                          f (x)
f (x) = 0.0001       f (x) = 1000      x




                                           23
Kernel Method




                24
Kernel Method


D = {(d   (1)
                ,y   (1)
                           ), . . . , (d   (|D|)
                                                   ,y   (|D|)
                                                                )}

                   1                             
    max.         −         αi αj y y K(d , d ) +
                                  (i) (j) (i) (j)
                                                    αi
                   2 i,j                          i
                 
     s.t.           αi y (i) = 0,
                      i
                 αi ≥ 0 ; ∀i.
                                               
                                  f (d) =                       (i)   (i)
                                                         αi y K(d , d) − b.
                                                   i                          25
Kernel Method


   d




                26
)   Tree Kernel




                  27
)                                           Polynomial Kernel



            Kpoly (x , x
                   (i)     (j)
                                 ) = (x   (i)
                                                ·x   (j)
                                                           + r) .
                                                               d


d
        x




                                                                    28
)                                       RBF Kernel



    KRBF (x , x
           (i)    (j)
                        ) = exp(−s|x   (i)
                                             −x      | ).
                                                  (j) 2


    s




                                                            29
Log-linear Model




               d             y
P (y|d)
 d


                                 31
Log-linear Model

                                        d
y                        φ(d, y)

                         w

                1
    P (y|d) =    exp(w · φ(d, y))
           Zd,w
                  
    where Zd,w =     exp(w · φ(d, y))
                    y

            y ∗ = argmax w · φ(d, y).
                    y                       32
P (y|d)

                       
log Pcond (D) =                      log P (y |d )
                                              (i)   (i)

                  (d(i) ,y (i) )∈D
                       
             =                       (w · φ(d(i) , y (i) ) − log Zd(i) ,w )
                  (d(i) ,y (i) )∈D




                                                                              33
C
L(w) =                      log P (y |d ) − |w| .
                                  (i)   (i)    2
                                           2
         (d(i) ,y (i) )∈D

C
                                                    34
L(w)




       35
wnew = wold + ∇w L(wold ).


                                                                                           
                                                       y φ(d(i) , y) exp(w · φ(d(i) , y))
∇w L(w) =                      φ(d(i) , y (i) ) −                                                − Cw
                                                                   Zd(i) ,w
            (d(i) ,y (i) )∈D
                                                                             
                                                   
       =                       φ(d(i) , y (i) ) −        P (y|d(i) )φ(d(i) , y)   − Cw
            (d(i) ,y (i) )∈D                        y

                                                                                                   36
Quasi-Newton Method

                                                L
      H
                           −1
w   new
          =w   old
                     +   Hwold ∇w L(w )
                                      old




                                                    37
38
Feature Selection




                    w
     Xw = 1             Xw = 0
Xw              C
                                 40
Pointwise Mutual Information

                           x
y

                               P (x, y)
              PMI(x, y) = log            .
                              P (x)P (y)

    x   y
            P (x, y)
               x       y

                                                          41
Pointwise Mutual Information

w        c
                     P (Xw = 1, C = c)
    PMI(w, c) = log
                    P (Xw = 1)P (C = c)


                      
     Iaverage (w) =       P (c)PMI(w, c),
                      c
        Imax (w) = max P (c)PMI(w, c).
                      c


                                              42
Pointwise Mutual Information


                   W
                          P (W = w, C = c)
         PMI(w, c) = log
                         P (W = w)P (C = c)
                         c                      c
Xc = 1                 Xc = 0                       Xc

                          P (Xw = 1, Xc = 1)
         PMI(w, c) = log
                         P (Xw = 1)P (Xc = 1)
                                                    43
Pointwise Mutual Information


                                            w
      c
PMI(w, c) = log P (C = c|Xw = 1) − log P (C = c)
          = log 1 − log P (C = c).

                                     
          c                      w
PMI(w , c) = log P (C = c|Xw = 1) − log P (C = c)
      

              = log 0.99 − log P (C = c).
                                                     44
Information Gain




                   45
Information Gain

                      C

                      
         H(C) = −             P (c) log P (c).
                          c

     w

                  
H(C|Xw = t) = −       P (c|Xw = t) log P (c|Xw = t).
                  c

                                                       46
Information Gain

w



IG(w) = H(C) − (P (Xw = 1)H(C|Xw = 1)
                + P (Xw = 0)H(C|Xw = 0)).
      w                      H(C|Xw = 1)
                             P (Xw = 1)


                                            47

Más contenido relacionado

Destacado

深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待Seiya Tokui
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to ChainerSeiya Tokui
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning FrameworksSeiya Tokui
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningSeiya Tokui
 
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelNIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelSeiya Tokui
 
Deep Learningの技術と未来
Deep Learningの技術と未来Deep Learningの技術と未来
Deep Learningの技術と未来Seiya Tokui
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerSeiya Tokui
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSeiya Tokui
 
Deep Learningの基礎と応用
Deep Learningの基礎と応用Deep Learningの基礎と応用
Deep Learningの基礎と応用Seiya Tokui
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5Seiya Tokui
 
Deep learning実装の基礎と実践
Deep learning実装の基礎と実践Deep learning実装の基礎と実践
Deep learning実装の基礎と実践Seiya Tokui
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep LearningSeiya Tokui
 
Deep Learning技術の今
Deep Learning技術の今Deep Learning技術の今
Deep Learning技術の今Seiya Tokui
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui
 
深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開Seiya Tokui
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Seiya Tokui
 
Chainer v2 alpha
Chainer v2 alphaChainer v2 alpha
Chainer v2 alphaSeiya Tokui
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習Preferred Networks
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative ModelsSeiya Tokui
 

Destacado (20)

深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待深層学習フレームワークChainerの紹介とFPGAへの期待
深層学習フレームワークChainerの紹介とFPGAへの期待
 
rinko2010
rinko2010rinko2010
rinko2010
 
Introduction to Chainer
Introduction to ChainerIntroduction to Chainer
Introduction to Chainer
 
Differences of Deep Learning Frameworks
Differences of Deep Learning FrameworksDifferences of Deep Learning Frameworks
Differences of Deep Learning Frameworks
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep Learning
 
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding ModelNIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
NIPS2013読み会 DeViSE: A Deep Visual-Semantic Embedding Model
 
Deep Learningの技術と未来
Deep Learningの技術と未来Deep Learningの技術と未来
Deep Learningの技術と未来
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Deep Learningの基礎と応用
Deep Learningの基礎と応用Deep Learningの基礎と応用
Deep Learningの基礎と応用
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5
 
Deep learning実装の基礎と実践
Deep learning実装の基礎と実践Deep learning実装の基礎と実践
Deep learning実装の基礎と実践
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
Deep Learning技術の今
Deep Learning技術の今Deep Learning技術の今
Deep Learning技術の今
 
論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks論文紹介 Pixel Recurrent Neural Networks
論文紹介 Pixel Recurrent Neural Networks
 
深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開深層学習フレームワーク Chainer の開発と今後の展開
深層学習フレームワーク Chainer の開発と今後の展開
 
Chainer Development Plan 2015/12
Chainer Development Plan 2015/12Chainer Development Plan 2015/12
Chainer Development Plan 2015/12
 
Chainer v2 alpha
Chainer v2 alphaChainer v2 alpha
Chainer v2 alpha
 
IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習IIBMP2016 深層生成モデルによる表現学習
IIBMP2016 深層生成モデルによる表現学習
 
論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 

Más de Seiya Tokui

Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)Seiya Tokui
 
Chainer v2 and future dev plan
Chainer v2 and future dev planChainer v2 and future dev plan
Chainer v2 and future dev planSeiya Tokui
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Seiya Tokui
 
Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用Seiya Tokui
 

Más de Seiya Tokui (6)

Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)Chainer/CuPy v5 and Future (Japanese)
Chainer/CuPy v5 and Future (Japanese)
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
Chainer v2 and future dev plan
Chainer v2 and future dev planChainer v2 and future dev plan
Chainer v2 and future dev plan
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と自然言語処理への応用
 
rinko2011-agh
rinko2011-aghrinko2011-agh
rinko2011-agh
 

Ml4nlp 4 2

  • 1. 4. ( ) 1
  • 2. Menu 2
  • 3. SVM
  • 4. SVM Support Vector Machine 4
  • 5. Linear Binary Classifier w x f (x) = w · x − b +1, if f (x) ≥ 0, w·x=b y= −1, if f (x) 0. 5
  • 6. Linear Binary Classifier w D = {(x (1) ,y (1) ), . . . , (x (|D|) ,y (|D|) )} w·x=b 6
  • 8. Margin Maximization x+ x+ x∗ x+ x∗ |x+ − x∗ |. 8
  • 9. Margin Maximization w w · x+ − b = 1 x+ 1 |x+ − x∗ | = w · (x+ − x∗ ) x∗ |w| 1 w·x=b = |w| |w|2 9
  • 10. SVM Hard Margin SVM y (w · x (i) (i) − b) ≥ 1. 1 min. |w|2 2 s.t. y (w · x − b) ≥ 1 ; ∀i. (i) (i) 10
  • 11. SVM Hard Margin SVM 1 min. |w|2 2 s.t. y (w · x − b) ≥ 1 ; ∀i. (i) (i) 11
  • 12. Dual Problem αi (≥ 0) 1 L(w, b, α) = |w|2 − αi y (i) (w · x(i) − b) − 1 . 2 i ∇w L = w − (i) αi y x (i) = 0. ∴ w = ∗ (i) αi y x . (i) i i ∂L = αi y (i) = 0. ∂b i 12
  • 13. Dual Problem 2 2 1 ∗ 1 (i) (i) (i) (i) L(w , b, α) = w − ∗ αi y x − αi y x 2 i 2 i +b αi y (i) + αi i i 2 1 (i) (i) =− αi y x + αi 2 i i 1 =− αi αj y y x · x + (i) (j) (i) (j) αi . 2 i,j i 13
  • 14. Dual Problem 1 max. − αi αj y y x · x + (i) (j) (i) (j) αi 2 i,j i s.t. αi y = 0, (i) i αi ≥ 0 ; ∀i. 14
  • 16. Dual Problem f (x) = (i) αi y x (i) · x − b. i (i) (i) (j) x x ·x (i) x ·x αi = 0 x (i) 16
  • 17. SVM Soft Margin SVM 17
  • 18. SVM Soft Margin SVM ξi (≥ 0) y (w · x (i) (i) − b) ≥ 1 − ξi . 1 min. |w| + C 2 ξi 2 i s.t. y (w · x (i) (i) − b) ≥ 1 − ξi ; ∀i, ξi ≥ 0 ; ∀i. C 18
  • 19. SVM Soft Margin SVM αi βi 1 L(w, b, ξ, α, β) = |w|2 + C ξi 2 i − αi y (w · x − b) − 1 + ξi − (i) (i) βi ξi . i i w b ξi ∂L = C − αi − βi ∂ξi C = αi + βi . 19
  • 20. SVM Soft Margin SVM L 1 L=− (i) (j) αi αj y y x (i) ·x (j) 2 i,j + αi (1 − ξi ) + C ξi − βi ξi i i i 1 =− αi αj y (i) y (j) x(i) · x(j) + αi . 2 i,j i βi ξi βi = C − αi ≥ 0 20
  • 21. SVM Soft Margin SVM 1 max. − αi αj y y x · x + (i) (j) (i) (j) αi 2 i,j i s.t. αi y (i) = 0, i 0 ≤ αi ≤ C ; ∀i. 21
  • 22. f (x) 22
  • 23. Functional Distance f (x) f (x) = 0.0001 f (x) = 1000 x 23
  • 25. Kernel Method D = {(d (1) ,y (1) ), . . . , (d (|D|) ,y (|D|) )} 1 max. − αi αj y y K(d , d ) + (i) (j) (i) (j) αi 2 i,j i s.t. αi y (i) = 0, i αi ≥ 0 ; ∀i. f (d) = (i) (i) αi y K(d , d) − b. i 25
  • 27. ) Tree Kernel 27
  • 28. ) Polynomial Kernel Kpoly (x , x (i) (j) ) = (x (i) ·x (j) + r) . d d x 28
  • 29. ) RBF Kernel KRBF (x , x (i) (j) ) = exp(−s|x (i) −x | ). (j) 2 s 29
  • 30.
  • 31. Log-linear Model d y P (y|d) d 31
  • 32. Log-linear Model d y φ(d, y) w 1 P (y|d) = exp(w · φ(d, y)) Zd,w where Zd,w = exp(w · φ(d, y)) y y ∗ = argmax w · φ(d, y). y 32
  • 33. P (y|d) log Pcond (D) = log P (y |d ) (i) (i) (d(i) ,y (i) )∈D = (w · φ(d(i) , y (i) ) − log Zd(i) ,w ) (d(i) ,y (i) )∈D 33
  • 34. C L(w) = log P (y |d ) − |w| . (i) (i) 2 2 (d(i) ,y (i) )∈D C 34
  • 35. L(w) 35
  • 36. wnew = wold + ∇w L(wold ). y φ(d(i) , y) exp(w · φ(d(i) , y)) ∇w L(w) = φ(d(i) , y (i) ) − − Cw Zd(i) ,w (d(i) ,y (i) )∈D = φ(d(i) , y (i) ) − P (y|d(i) )φ(d(i) , y) − Cw (d(i) ,y (i) )∈D y 36
  • 37. Quasi-Newton Method L H −1 w new =w old + Hwold ∇w L(w ) old 37
  • 38. 38
  • 39.
  • 40. Feature Selection w Xw = 1 Xw = 0 Xw C 40
  • 41. Pointwise Mutual Information x y P (x, y) PMI(x, y) = log . P (x)P (y) x y P (x, y) x y 41
  • 42. Pointwise Mutual Information w c P (Xw = 1, C = c) PMI(w, c) = log P (Xw = 1)P (C = c) Iaverage (w) = P (c)PMI(w, c), c Imax (w) = max P (c)PMI(w, c). c 42
  • 43. Pointwise Mutual Information W P (W = w, C = c) PMI(w, c) = log P (W = w)P (C = c) c c Xc = 1 Xc = 0 Xc P (Xw = 1, Xc = 1) PMI(w, c) = log P (Xw = 1)P (Xc = 1) 43
  • 44. Pointwise Mutual Information w c PMI(w, c) = log P (C = c|Xw = 1) − log P (C = c) = log 1 − log P (C = c). c w PMI(w , c) = log P (C = c|Xw = 1) − log P (C = c) = log 0.99 − log P (C = c). 44
  • 46. Information Gain C H(C) = − P (c) log P (c). c w H(C|Xw = t) = − P (c|Xw = t) log P (c|Xw = t). c 46
  • 47. Information Gain w IG(w) = H(C) − (P (Xw = 1)H(C|Xw = 1) + P (Xw = 0)H(C|Xw = 0)). w H(C|Xw = 1) P (Xw = 1) 47