70. E = {ε1, . . . , εm}, m < n
E ε {(Xε, Yε)}ε∈E
R =
1
m
ε∈E
(Yε − f(z) − CXε)2
(11)
f(z) C
f(z)
ˆfs(z)
50 / 74
71. z ˆfs(z)
leave-one-out
ˆHs(D) = −
1
n
n
i=1
ln ˆfs,i(xi), (12)
ˆfs,i(xi) xi
ˆHs(D) Simple Regression Entropy
Estimator (SRE) [Hino+, 2015]
51 / 74
72. SRE: how it works
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Normal
x
density
0 1 2 3 40.240.280.320.36
Normal
epsilon^2
f(z)
Fitted density function Fitted intercept ˆfs(z = 0.5)
52 / 74
73. SRE: how it works
−3 −2 −1 0 1 2 3
0.000.100.200.30
Bimodal
x
density
1.0 1.5 2.0 2.5 3.0 3.5 4.00.2250.2350.245
Bimodal
epsilon^2
f(z)
Fitted density function Fitted intercept ˆfs(z = 0.5)
53 / 74
74. ε xi ∈ D
Yε ≃ f(xi) + CXε
Yε = kε
ncpεp C = ptr∇2f(xi)
4(p/2+1) xi
Y i
ε Ci :
Y i
ε ≃ f(xi) + Ci
Xε
54 / 74
75. Y i
ε = f(xi) + CiXε
xi ∈ D
−
1
n
n
i=1
ln Y i
ε = −
1
n
n
i=1
ln f(xi) + Ci
Xε
= −
1
n
n
i=1
ln f(xi) 1 +
CiXε
f(xi)
= −
1
n
n
i=1
ln f(xi) −
1
n
n
i=1
ln 1 +
CiXε
f(xi)
≃ −
1
n
n
i=1
ln f(xi) −
1
n
n
i=1
Ci
f(xi)
Xε
55 / 74
76. −
1
n
n
i=1
ln Y i
ε ≃ −
1
n
n
i=1
ln f(xi) −
1
n
n
i=1
Ci
f(xi)
Xε
¯Yε = − 1
n
n
i=1 ln Y i
ε
H(D) = − 1
n
n
i=1 f(xi)
¯C = − 1
n
n
i=1
Ci
f(xi)
ε > 0
¯Yε = H(D) + ¯CXε (13)
56 / 74
77. ε ∈ E (13)
Rd =
1
m
ε∈E
( ¯Yε − H(D) − ¯CXε)2
Direct Regression Entropy
Estimator (DRE) [Hino+, 2015]
57 / 74
82. kε ≃ cpnf(z)εp
+ cpn
p
4(p/2 + 1)
tr∇2
f(z)εp+2
X = (εp, εp+2) Y = kε
Y = β⊤X
kε Poisson
62 / 74
83. max L(β) =
m
i=1
e−X⊤
i β(X⊤
i β)Yi
Yi!
εp β1
ˆβ1
z ˆβ1/(cpn)
SRE LOO
Entropy Estimator with Poisson-noise structure and
Identity-link regression(EPI) [Hino+,under review]
63 / 74
86. Univariate Case
15 distributions
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Normal
x
density
−3 −2 −1 0 1 2 3
0.00.10.20.30.40.5
Skewed
x
density
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.01.21.4
Strongly Skewed
x
density
−3 −2 −1 0 1 2 3
0.00.51.01.5
Kurtotic
x
density
−3 −2 −1 0 1 2 3
0.000.050.100.150.200.250.30
Bimodal
x
density
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Skewed Bimodal
x
density
66 / 74
87. Univariate Case
15 distributions
−3 −2 −1 0 1 2 3
0.000.050.100.150.200.250.30
Trimodal
x
density
−3 −2 −1 0 1 2 3
0.00.10.20.30.40.50.6
10 Claw
x
density
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Standard Power Exponential
x
density
−3 −2 −1 0 1 2 3
0.050.100.150.200.25
Standard Logistic
x
density
−3 −2 −1 0 1 2 3
0.10.20.30.40.5
Standard Classical Laplace
x
density
−3 −2 −1 0 1 2 3
0.10.20.3
t(df=5)
x
density
67 / 74
88. Univariate Case
15 distributions
−3 −2 −1 0 1 2 3
0.050.100.150.200.25
Mixed t
x
density
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
Standard Exponential
x
density
−3 −2 −1 0 1 2 3
0.050.100.150.200.250.30
Cauchy
x
density
68 / 74
89. ●
●●
●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Normal
x
density
−3 −2 −1 0 1 2 3
0.00.10.20.30.40.5
Skewed
x
density
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.01.21.4
Strongly Skewed
x
density
−3 −2 −1 0 1 2 3
0.00.51.01.5
Kurtotic
x
density
−3 −2 −1 0 1 2 3
0.000.050.100.150.200.250.30
Bimodal
x
density
69 / 74
90. ●
●
●
●
●
●
●●
●
●
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Skewed Bimodal
x
density
−3 −2 −1 0 1 2 3
0.000.050.100.150.200.250.30
Trimodal
x
density
−3 −2 −1 0 1 2 3
0.00.10.20.30.40.50.6
10 Claw
x
density
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Standard Power Exponential
x
density
−3 −2 −1 0 1 2 3
0.050.100.150.200.25
Standard Logistic
x
density
69 / 74
91. ●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
0.10.20.30.40.5
Standard Classical Laplace
x
density
−3 −2 −1 0 1 2 3
0.10.20.3
t(df=5)
x
density
−3 −2 −1 0 1 2 3
0.050.100.150.200.25
Mixed t
x
density
−3 −2 −1 0 1 2 3
0.00.20.40.60.81.0
Standard Exponential
x
density
−3 −2 −1 0 1 2 3
0.050.100.150.200.250.30
Cauchy
x
density
69 / 74
95. I
[Faivishevsky&Goldberger, 2010] Faivishevsky, L. and Goldberger, J. (2010).
A Nonparametric Information Theoretic Clustering Algorithm.
ICML2010.
[Hino+, 2015] Hino, H., Koshijima, K., and Murata, N. (2015).
Non-parametric entropy estimators based on simple linear regression.
Computational Statistics & Data Analysis, 89(0):72 – 84.
[Hino&Murata, 2010] Hino, H. and Murata, N. (2010).
A conditional entropy minimization criterion for dimensionality reduction and
multiple kernel learning.
Neural Computation, 22(11):2887–2923.
[Hyv¨arinen&Oja, 2000] Hyv¨arinen, A. and Oja, E. (2000).
Independent component analysis: algorithms and applications.
Neural Networks, 13(4-5):411–430.
[Koshijima+, 2015] Koshijima, K., Hino, H., and Murata, N. (2015).
Change-point detection in a sequence of bags-of-data.
Knowledge and Data Engineering, IEEE Transactions on, 27(10):2632–2644.
73 / 74
96. II
[Murata+, 2013] Murata, N., Koshijima, K., and Hino, H. (2013).
Distance-based change-point detection with entropy estimation.
In Proceedings of the Sixth Workshop on Information Theoretic Methods in
Science and Engineering, pages 22–25.
74 / 74