17. 研究の背景
• DNNを統計・学習理論で解析する論⽂
• Suzuki, T. (2018). Fast learning rate of deep learning via a
kernel perspective. JMLR W&CP (AISTATS).
• Schmidt-Hieber, J. (2017). Nonparametric regression using
deep neural networks with ReLU activation function. arXiv.
• Neyshabur, B., Tomioka, R., & Srebro, N. (2015). Norm-
based capacity control in neural networks. JMLR W&CP
(COLT).
• Sun, S., Chen, W., Wang, L., & Liu, T. Y. (2015). Large
margin deep neural networks: theory and algorithms, arXiv.
• ⾮滑らかな構造は主たる関⼼ではない
19. 区分上で滑らかな関数の定式化
• 定式化の流れ
• 1. [0,1]-上の滑らかな関数
• 2. [0,1]-に含まれる区分
• 1. [0,1]-
上の滑らかな関数
• 準備:ヘルダーノルム
• 定義:ヘルダー空間
G[✓`](x) = x(`)
,
where x` is defined inductively as
x(0)
:= x,
x(`0)
:= ⌘(A`0 x(`0 1)
+ b`0 ), for `0
= 1, ..., ` 1,
where ⌘ is an element-wise ReLU function, i.e., ⌘(x) = (max{0, x1}, ..., max{0, x
Here, we define that c(✓) denotes a number of non-zero parameters in ✓.
1.2. Characterization for True functions. We consider a piecewise smooth
functions for characterizing f⇤. To this end, we introduce a formation of
some set of functions.
Smooth Functions Secondly, a set for smooth functions is introduced.
With ↵ > 0, let us define the H¨older norm
kfkH := max
|a|b c
sup
x2[ 1,1]D
|@a
f(x)| + max
|a|=b c
sup
x,x02[ 1,1]D
|@af(x) @af(x0)|
|x x0| b c
,
and also H ([ 1, 1]d) be the H¨older space such that
H = H ([ 1, 1]D
) := f : [ 1, 1]D
! R |kfkH CH ,
where CH is some finite constant.
Date: January 13, 2018.
H = H ([0, 1]D
) = f : [0, 1]D
! R|kfkH < 1
43. 参照論⽂
• Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression.
The annals of statistics, 1040-1053.
• Suzuki, T. (2018). Fast learning rate of deep learning via a kernel perspective. JMLR
W&CP (AISTATS).
• Schmidt-Hieber, J. (2017). Nonparametric regression using deep neural networks with
ReLU activation function. arXiv.
• Neyshabur, B., Tomioka, R., & Srebro, N. (2015). Norm-based capacity control in neural
networks. JMLR W&CP (COLT).
• Sun, S., Chen, W., Wang, L., & Liu, T. Y. (2015). Large margin deep neural networks:
theory and algorithms, arXiv.
• Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., & LeCun, Y. (2017) The loss
surfaces of multilayer networks. JMLR W&CP (AISTATS).
• Kawaguchi, K. (2016). Deep learning without poor local minima. In Advances in Neural
Information Processing Systems.
• Yarotsky, D. (2017). Error bounds for approximations with deep ReLU networks. Neural
Networks, 94, 103-114.
• Safran, I., & Shamir, O. (2017). Depth-width tradeoffs in approximating natural functions
with neural networks. JMLR W&CP (ICML).
• Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2016). Understanding deep
learning requires rethinking generalization. ICLR.
• Xu, A., & Raginsky, M. (2017). Information-theoretic analysis of generalization capability
of learning algorithms. In Advances in Neural Information Processing Systems.