SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
SchNet: A continuous-filter convolutional neural network
for modeling quantum interactions
Kazuki Fujikawa, DeNA
サマリ
• 書誌情報
– NIPS2017
– Schütt, K., Kindermans, P. J., Felix, H. E. S., Chmiela, S., Tkatchenko, A., and
Müller, K. R.
• 概要
– Graph Convolution に関する論⽂
– グラフの接続情報を使わず、ノード間の距離を使って畳み込みを⾏う
• 3次元空間上で任意に位置するノードとの相互作⽤をモデリングすることが可能
• 下記のようなケースで特に有効
– 同じグラフ構造でも異なる配置が存在し、それにより特性が変化するケース
– グラフ上の距離と実空間上の距離に乖離が⽣じるケース
2
アウトライン
• 背景
• 関連研究
– Message Passing Neural Networkとその派⽣
• 提案⼿法
– Continuous-filter convolutional layer
– Interaction block
– SchNet
• 実験・結果
3
アウトライン
• 背景
• 関連研究
– Message Passing Neural Networkとその派⽣
• 提案⼿法
– Continuous-filter convolutional layer
– Interaction block
– SchNet
• 実験・結果
4
背景
• 創薬や材料化学の分野における最適分⼦の探索において、物性は重要な情報
– DFT(Density Functional Theory)などによる近似がよく利⽤される
– ⾮常に計算コストが⼤きく、⼗分な探索ができない課題があった
• 機械学習モデルで物性を⾼速かつ正確に予測できると有⽤
– DFTで得たデータを教師データとして、機械学習で物性を予測するタスクが近年盛んに
⾏われている
5
ssing for Quantum Chemistry
1
Patrick F. Riley 2
Oriol Vinyals 3
George E. Dahl 1
-
-
-
k
e
e
d
f
t
f
l
m
DFT
103
seconds
Message Passing Neural Net
10 2
seconds
E,!0, ...
Targets
Figure 1. A Message Passing Neural Network predicts quantum
図引⽤: Gilmer+, ICML2017
背景
• 画像認識 / ⾃然⾔語処理などの分野で深層学習は発展
– CNN / RNN による効率的な特徴抽出法が貢献
– 画像 / テキストは規則正しいグラフと考えられる
• 分⼦のグラフに対して同じ⽅法を適⽤することは困難
– ノードの次数が⼀定ではない、エッジには属性が付いている、etc.
6
図引⽤: 機は熟した!グラフ構造に対する
Deep Learning、Graph Convolutionのご紹介
(http://tech-blog.abeja.asia/entry/2017/04/27/105613)
図引⽤: wikipedia
(https://ja.wikipedia.org/wiki/酢酸)
アウトライン
• 背景
• 関連研究
– Message Passing Neural Networkとその派⽣
• 提案⼿法
– Continuous-filter convolutional layer
– Interaction block
– SchNet
• 実験・結果
7
関連研究
• Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017]
– ノードの次数が不規則なグラフに対して有効な特徴抽出法をGilmerらが⼀般化
– 各層では、各ノードに割り当てられた特徴ベクトルを、隣接するノードやエッジの
特徴ベクトルを使って更新する
– 上記をL層繰り返すと各ノードの特徴ベクトルはL近傍のノードやエッジの情報を反映した
ものとなる
8
ntum Chemistry
Oriol Vinyals 3
George E. Dahl 1
DFT
103
seconds
Message Passing Neural Net
10 2
seconds
E,!0, ...
Targets
Message Passing Neural Network predicts quantum
f an organic molecule by modeling a computationally
DFT calculation.
図引⽤: Gilmer+, ICML2017
関連研究
• Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017]
– Message passing phase
• Message function: 𝑀"(ℎ%
" , ℎ'
" , 𝑒%')	
– 各ノードが隣接するノードに対して伝搬させる情報を作成する
• Update function: 𝑈" ℎ%
" , 𝑚%
"-.
– 各ノードが隣接するノードから情報を貰い、⾃分⾃⾝の情報を更新する
9
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
関連研究
• Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017]
– Readout phase
• Readout function: 𝑅({ℎ%
(3)
|𝑣 ∈ 𝐺})
– Message passing phaseを経て得られた各ノードの情報を統合し、
グラフ全体に対して⼀つの情報を得る
10
関連研究
• CNN for Learning Molecular Fingerprints [Duvenaud+, NIPS2015]
– Message passing phase
• Message function: 𝑀" ℎ%
" , ℎ'
" , 𝑒%' = 𝑐𝑜𝑛𝑐𝑎𝑡(ℎ'
" , 𝑒%')
• Update function: 𝑈" ℎ%
" , 𝑚%
"-. = 𝜎 𝐻"
ABC %
𝑚%
"-.
– 𝐻"
ABC	(%)
: step 𝑡、頂点 𝑣 における次数 deg	( 𝑣) ごとに準備された重み
11
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
関連研究
• CNN for Learning Molecular Fingerprints [Duvenaud+, NIPS2015]
– Readout phase
• Readout function: 𝑅 ℎ%
3
𝑣 ∈ 𝐺 = 𝑓(	∑ 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊"ℎ%
"
%," 	)
12
関連研究
• Gated Graph Neural Networks (GG-NN) [Li+, ICLR2016]
– Message passing phase
– Message function: 𝑀" ℎ%
"
, ℎ'
"
, 𝑒%' = 𝐴N'ℎ'
"
» 𝐴N': エッジの種類(単結合、⼆重結合、etc.)ごとに定義された重み
– Update function: 𝑈" ℎ%
"
, 𝑚%
"-.
= GRU ℎ%
"
, 𝑚%
"-.
13
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
関連研究
• Gated Graph Neural Networks (GG-NN) [Li+, ICLR2016]
– Readout phase
• Readout function:
– 𝑅 ℎ%
3
𝑣 ∈ 𝐺 = tanh(	∑ 	𝜎% 𝑖 ℎ%
3
, ℎ%
W
	⊙ 	tanh	 𝑗 ℎ%
3
, ℎ%
W
)
– 𝑖, 𝑗: NNの重み、 𝜎 𝑖 ℎ%
3
, ℎ%
W
	: soft attentionの役割
14
関連研究
• Deep Tensor Neural Networks (DTNN) [Schütt+, Nature2017]
– Message passing phase
• Message function: 𝑀" ℎ%
" , ℎ'
" , 𝑒%' = tanh 𝑊Z[ 𝑊[Zℎ
" + 𝑏. 	⊙ 𝑊_Z 𝑒% + 𝑏`
– 𝑊Z[
, 𝑊[Z
, 𝑊_Z
: それぞれ共有重み、𝑏., 𝑏`: バイアス項
• Update function: 𝑈" ℎ%
" , 𝑚%
"-. = ℎ%
" + 𝑚%
"-.
15
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
関連研究
• Deep Tensor Neural Networks (DTNN) [Schütt+, Nature2017]
– Readout phase
• Readout function: 𝑅 ℎ%
3
𝑣 ∈ 𝐺 = ∑ NN(ℎ%
3
)%
16
関連研究
• Edge Network + Set2Set (enn-s2s) [Gilmer+, ICML2017]
– Message passing phase
• Message function: 𝑀" ℎ%
" , ℎ'
" , 𝑒%' = 𝐴(𝑒%)ℎ'
"
– 𝐴(𝑒%): エッジベクトル 𝑒% を変換するNN
• Update function: 𝑈" ℎ%
" , 𝑚%
"-. = GRU ℎ%
" , 𝑚%
"-.
– GGNN [Li+, ICLR2016] と同様
17
v
u1
u2
h(0)
v
h(0)
u1
h(0)
u2
Message Function:
𝑀"(ℎ%
"
, ℎ'/
"
, 𝑒%'/
)
Σ
Message Function:
𝑀"(ℎ%
"
, ℎ'0
"
, 𝑒%'0
)
Update Function:
𝑈"(ℎ%
"
, 𝑚%
"-.
)
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
Recurrent Unit introduced in Cho et al. (2
used weight tying, so the same update fu
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(
where i and j are neural networks, and
wise multiplication.
Interaction Networks, Battaglia et al. (2
This work considered both the case whe
get at each node in the graph, and where
level target. It also considered the case
node level effects applied at each time
case the update function takes as input th
(hv, xv, mv) where xv is an external vec
some outside influence on the vertex v. Th
tion M(hv, hw, evw) is a neural network
concatenation (hv, hw, evw). The vertex
U(hv, xv, mv) is a neural network whic
the concatenation (hv, xv, mv). Finally, i
there is a graph level output, R = f(
P
Neural Message Passing for Quantum Chemistry
time steps and is defined in terms of message functions Mt
and vertex update functions Ut. During the message pass-
ing phase, hidden states ht
v at each node in the graph are
updated based on messages mt+1
v according to
mt+1
v =
X
w2N(v)
Mt(ht
v, ht
w, evw) (1)
ht+1
v = Ut(ht
v, mt+1
v ) (2)
where in the sum, N(v) denotes the neighbors of v in graph
G. The readout phase computes a feature vector for the
whole graph using some readout function R according to
ˆy = R({hT
v | v 2 G}). (3)
The message functions Mt, vertex update functions Ut, and
readout function R are all learned differentiable functions.
R operates on the set of node states and must be invariant to
permutations of the node states in order for the MPNN to be
invariant to graph isomorphism. In what follows, we define
previous models in the literature by specifying the message
function Mt, vertex update function Ut, and readout func-
tion R used. Note one could also learn edge features in
an MPNN by introducing hidden states for all edges in the
graph ht
evw
and updating them analogously to equations 1
and 2. Of the existing MPNNs, only Kearnes et al. (2016)
has used this idea.
Recurrent Unit introduced in Cho et al. (2014). This work
used weight tying, so the same update function is used at
each time step t. Finally,
R =
X
v2V
⇣
i(h(T )
v , h0
v)
⌘ ⇣
j(h(T )
v )
⌘
(4)
where i and j are neural networks, and denotes element-
wise multiplication.
Interaction Networks, Battaglia et al. (2016)
This work considered both the case where there is a tar-
get at each node in the graph, and where there is a graph
level target. It also considered the case where there are
node level effects applied at each time step, in such a
case the update function takes as input the concatenation
(hv, xv, mv) where xv is an external vector representing
some outside influence on the vertex v. The message func-
tion M(hv, hw, evw) is a neural network which takes the
concatenation (hv, hw, evw). The vertex update function
U(hv, xv, mv) is a neural network which takes as input
the concatenation (hv, xv, mv). Finally, in the case where
there is a graph level output, R = f(
P
v2G
hT
v ) where f is
a neural network which takes the sum of the final hidden
states hT
v . Note the original work only defined the model
for T = 1.
Molecular Graph Convolutions, Kearnes et al. (2016)
𝑒%'/
𝑒%'0
関連研究
• Edge Network + Set2Set (enn-s2s) [Gilmer+, ICML2017]
– Readout phase
• Readout function: 𝑅 ℎ%
3
𝑣 ∈ 𝐺 = set2set ℎ%
3
𝑣 ∈ 𝐺
– set2set [Vinyals+, ICLR2016] によって作成された𝑞"
∗
を後のNNの⼊⼒にする
– 他にも⼊⼒特徴の作り⽅などに⼯夫あり
18
whilst preserving the right properties which we just discussed: a memory that increases with the
size of the set, and which is order invariant. In the next sections, we explain such a modification,
which could also be seen as a special case of a Memory Network (Weston et al., 2015) or Neural
Turing Machine (Graves et al., 2014) – with a computation flow as depicted in Figure 1.
4.2 ATTENTION MECHANISMS
Neural models with memories coupled to differentiable addressing mechanism have been success-
fully applied to handwriting generation and recognition (Graves, 2012), machine translation (Bah-
danau et al., 2015a), and more general computation machines (Graves et al., 2014; Weston et al.,
2015). Since we are interested in associative memories we employed a “content” based attention.
This has the property that the vector retrieved from our memory would not change if we randomly
shuffled the memory. This is crucial for proper treatment of the input set X as such. In particular,
our process block based on an attention mechanism uses the following:
qt = LSTM(q⇤
t 1) (3)
ei,t = f(mi, qt) (4)
ai,t =
exp(ei,t)
P
j exp(ej,t)
(5)
rt =
X
i
ai,tmi (6)
q⇤
t = [qt rt] (7)
Read
Process Write
Figure 1: The Read-Process-and-Write model.
where i indexes through each memory vector mi (typically equal to the cardinality of X), qt is
a query vector which allows us to read rt from the memories, f is a function that computes a
single scalar from mi and qt (e.g., a dot product), and LSTM is an LSTM which computes a
recurrent state but which takes no inputs. q⇤
is the state which this LSTM evolves, and is formed
図引⽤: Vinyals+, ICLR2016
関連研究
• 既存⼿法の限界
– ノード間の相互作⽤をグラフ上での距離のみに基づいてモデリングしている
• グラフ構造は同じでも異なる配置が存在し、それにより予測対象の傾向が変わる場合
を想定していない
• グラフ上の距離は遠いが実空間では近いような場合に、相互作⽤を適切にモデリング
できない可能性がある
19
図引⽤: https://en.wikipedia.org/wiki/Conformational_isomerism
アウトライン
• 背景
• 関連研究
– Message Passing Neural Networkとその派⽣
• 提案⼿法
– Continuous-filter convolutional layer
– Interaction block
– SchNet
• 実験・結果
20
提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
21
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
22
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
Filter-generating Networks
𝜇. = 0.1Å, 𝜇` = 0.2Å, … 𝜇qWW = 30Å ,
𝛾 = 10Åでrbfカーネルを300個⽤意
⇓
𝑑ghに最も近い𝜇を持つカーネルは
1に近づき、遠ざかるに従い0に近づく
(ソフトな1-hot表現が得られる)
提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
23
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
Filter-generating Networks
出⼒ベクトルとノードのembed vector
の要素積を取る
⇓
各ユニットのactivationでノードの
embed vectorをフィルタする
提案⼿法: Continuous-filter convolution (cfconv)
• ノード間の距離情報を利⽤して重み付けするフィルタ
– “重要視したい距離” を学習で求める
24
(left), the interaction block (middle)
work (right). The shifted softplus is
Zi
3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is
obtained by using interatomic distances
dij = kri rjk
as input for the filter network. Without further processing, the filters would be highly correlated since
a neural network after initialization is close to linear. This leads to a plateau at the beginning of
training that is hard to overcome. We avoid this by expanding the distance with radial basis functions
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆ @ ˆE
dense + shifted softplus
embed
Zj’ Zj
dense + shifted softplus
× ×
embed
+
提案⼿法: Interaction block
• cfconv layerを含むmessage passing layer
– cfconv layerでノード間の相互作⽤を考慮して各ノードの特徴ベクトルを更新
– ノード距離に制限無く相互作⽤を表現することが可能(DTNNなどとの相違点)
25chNet with an architectural overview (left), the interaction block (middle)
convolution with filter-generating network (right). The shifted softplus is
(a) 1st
interaction block (b) 2nd
interaction block (c) 3rd
interaction block
Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of
SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red.
Filter-generating networks The cfconv layer including its filter-generating network are depicted
at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
提案⼿法: SchNet
• interaction, atom-wise を挟み、最終的に各原⼦ごとに1次元のスカラー値を
出⼒する
• 出⼒されたスカラー値を原⼦数分⾜し合わせて分⼦全体の予測結果を得る
26
Figure 2: Illustration of SchNet with an architectural overview (left), the interaction block (middle)
and the continuous-filter convolution with filter-generating network (right). The shifted softplus is
defined as ssp(x) = ln(0.5ex
+ 0.5).
提案⼿法: Loss
• ⼆種類のロスを⾜した損失関数を定義
–
• エネルギーの予測に関する⼆乗誤差
–
• 原⼦間⼒の予測に関する⼆乗誤差を原⼦毎に求め、⾜し合わせたもの
• 𝜌: 原⼦間⼒を重要視する度合いを表すハイパーパラメータ
• 原⼦間⼒の予測値は下記により計算で求める [Chmiela+, 2017]
27
110,462 0.31 – 0.45 0.33
We include the total energy E as well as forces Fi in the training loss to train a neural network that
performs well on both properties:
`( ˆE, (E, F1, . . . , Fn)) = kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
This kind of loss has been used before for fitting a restricted potential energy surfaces with MLPs [36].
In our experiments, we use ⇢ = 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined
energy and force training. The value of ⇢ was optimized empirically to account for different scales of
energy and forces.
Due to the relation of energies and forces reflected in the model, we expect to see improved gen-
eralization, however, at a computational cost. As we need to perform a full forward and backward
pass on the energy model to obtain the forces, the resulting force model is twice as deep and, hence,
requires about twice the amount of computation time.
Even though the GDML model captures this relationship between energies and forces, it is explicitly
optimized to predict the force field while the energy prediction is a by-product. Models such as
circular fingerprints [15], molecular graph convolutions or message-passing neural networks[19] for
property prediction across chemical compound space are only concerned with equilibrium molecules,
i.e., the special case where the forces are vanishing. They can not be trained with forces in a similar
manner, as they include discontinuities in their predicted potential energy surface caused by discrete
binning or the use of one-hot encoded bond type information.
0.34 0.84 – –
0.31 – 0.45 0.33
as well as forces Fi in the training loss to train a neural network that
es:
. , Fn)) = kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
before for fitting a restricted potential energy surfaces with MLPs [36].
= 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined
value of ⇢ was optimized empirically to account for different scales of
s and forces reflected in the model, we expect to see improved gen-
putational cost. As we need to perform a full forward and backward
btain the forces, the resulting force model is twice as deep and, hence,
nt of computation time.
captures this relationship between energies and forces, it is explicitly
field while the energy prediction is a by-product. Models such as
cular graph convolutions or message-passing neural networks[19] for
mical compound space are only concerned with equilibrium molecules,
y predictions in kcal/mol on the QM9 data set with given
NN [18] enn-s2s [19] enn-s2s-ens5 [19]
0.94 – –
0.84 – –
– 0.45 0.33
forces Fi in the training loss to train a neural network that
kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
fitting a restricted potential energy surfaces with MLPs [36].
5 for pure energy based training and ⇢ = 100 for combined
was optimized empirically to account for different scales of
es reflected in the model, we expect to see improved gen-
cost. As we need to perform a full forward and backward
rces, the resulting force model is twice as deep and, hence,
utation time.
his relationship between energies and forces, it is explicitly
e the energy prediction is a by-product. Models such as
h convolutions or message-passing neural networks[19] for
ound space are only concerned with equilibrium molecules,
ek(ri rj) = exp( kdij µkk2
)
located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances
occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial
filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds
to reducing the resolution of the filter, while restricting the range of the centers corresponds to the
filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is
left for future work. We feed the expanded distances into two dense layers with softplus activations
to compute the filter weight W(ri rj) as shown in Fig. 2 (right).
Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on
an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of
interatomic distances. This enables its interaction block to update the representations according to the
radial environment of each atom. The sequential updates from three interaction blocks allow SchNet
to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping
rotational invariance due to the radial filters.
4.2 Training with energies and forces
As described above, the interatomic forces are related to the molecular energy, so that we can obtain
an energy-conserving force model by differentiating the energy model w.r.t. the atom positions
ˆFi(Z1, . . . , Zn, r1, . . . , rn) =
@ ˆE
@ri
(Z1, . . . , Zn, r1, . . . , rn). (4)
Chmiela et al. [17] pointed out that this leads to an energy-conserving force-field by construction.
As SchNet yields rotationally invariant energy predictions, the force predictions are rotationally
equivariant by construction. The model has to be at least twice differentiable to allow for gradient
descent of the force loss. We chose a shifted softplus ssp(x) = ln(0.5ex
+ 0.5) as non-linearity
アウトライン
• 背景
• 関連研究
– Message Passing Neural Networkとその派⽣
• 提案⼿法
– Continuous-filter convolutional layer
– Interaction block
– SchNet
• 実験・結果
28
実験: QM9
• DFTで算出された分⼦の17種の物性値を含むデータセット
– そのうち⼀つの物性: U0(絶対零度での分⼦全体のエネルギー)のみを予測対象とする
– 平衡状態で原⼦間⼒はゼロであり、予測する必要が無い
• ⽐較⼿法
– DTNN [Schütt+, Nature2017], enn-s2s [Gilmer+, ICML2017],
enn-s2s-ens5(enn-s2sのアンサンブル)
• 実験結果
– SchNetが⼀貫してSOTAの結果を⽰した
– 訓練データ 110k 件でMean Absolute Errorが 0.31kcal/mol だった
29
Table 1: Mean absolute errors for energy predictions in kcal/mol on the QM9 data set with given
training set size N. Best model in bold.
N SchNet DTNN [18] enn-s2s [19] enn-s2s-ens5 [19]
50,000 0.59 0.94 – –
100,000 0.34 0.84 – –
110,462 0.31 – 0.45 0.33
We include the total energy E as well as forces Fi in the training loss to train a neural network that
performs well on both properties:
⇢
nX @ ˆE
! 2
実験: MD17
• Molecular Dynamics (MD) シミュレーションを⾏ったデータセット
– ⼀つの分⼦(ベンゼンなど)に関する軌跡データ
• 8種の分⼦に関してデータを取り、別タスクとして学習する
• 同じ分⼦でもサンプルによって位置やエネルギー、原⼦間⼒が異なる
– 分⼦全体のエネルギー、原⼦間⼒をそれぞれ予測し、Mean Absolute Errorで評価
• ⽐較⼿法
– DTNN [Schütt+, Nature2017], GDML [Chmiela+, 2017]
30
• 実験結果
– N=1,000
• 多くのタスクでGDMLが上回った
• GDMLはカーネル回帰ベースのモデルであり、サンプル数 /
分⼦のノード数の⼆乗に⽐例して計算量が増加するため
N=50,000は学習できなかった
– N=50,000
• 多くのタスクでSchNetがDTNNを上回っている
• SchNetは(GDMLと⽐べて)スケーラビリティに優れており、
データ数の増加に従い精度も改善された
Table 2: Mean absolute errors for energy and force predictions in kcal/mol and kcal/mol/Å, respec-
tively. GDML and SchNet test errors for training with 1,000 and 50,000 examples of molecular
dynamics simulations of small, organic molecules are shown. SchNets were trained only on energies
as well as energies and forces combined. Best results in bold.
N = 1,000 N = 50,000
GDML [17] SchNet DTNN [18] SchNet
forces energy both energy energy both
Benzene
energy 0.07 1.19 0.08 0.04 0.08 0.07
forces 0.23 14.12 0.31 – 1.23 0.17
Toluene
energy 0.12 2.95 0.12 0.18 0.16 0.09
forces 0.24 22.31 0.57 – 1.79 0.09
Malonaldehyde
energy 0.16 2.03 0.13 0.19 0.13 0.08
forces 0.80 20.41 0.66 – 1.51 0.08
Salicylic acid
energy 0.12 3.27 0.20 0.41 0.25 0.10
forces 0.28 23.21 0.85 – 3.72 0.19
Aspirin
energy 0.27 4.20 0.37 – 0.25 0.12
forces 0.99 23.54 1.35 – 7.36 0.33
Ethanol
energy 0.15 0.93 0.08 – 0.07 0.05
forces 0.79 6.56 0.39 – 0.76 0.05
Uracil
energy 0.11 2.26 0.14 – 0.13 0.10
forces 0.24 20.08 0.56 – 3.28 0.11
Naphtalene
energy 0.12 3.58 0.16 – 0.20 0.11
forces 0.23 25.36 0.58 – 2.58 0.11
実験: ISO17
• Molecular Dynamics (MD) シミュレーションを⾏ったデータセット
– C7O2H10の異性体129種類に関する軌跡データ
• MD17とは違い、別の分⼦のデータが同じタスクとして含まれる
– 2種のタスクを⽤意
• known molecules / unknown conformation:
– テストデータに既知の分⼦・未知の⽴体配座を利⽤
• unknown molecules / unknown conformation:
– テストデータに未知の分⼦・未知の⽴体配座を利⽤
– ⽐較⼿法
• mean predictor (訓練データの分⼦毎の平均?)
31
• 結果
– known molecules / unknown conformation
• energy+forcesはQM9での精度に匹敵
– unknown molecules / unknown conformation
• energy+forcesはenergyのみよりも優れていた
– 原⼦間⼒を学習に加えることは、単⼀の分⼦にフィット
しているわけではなく、化合物空間全体で⼀般化されていた
– known moleculesと⽐べると精度に隔たりがあり、
さらなる改善が必要
Table 3: Mean absolute errors on C7O2H10 isomers in kcal/mol.
mean predictor SchNet
energy energy+forces
known molecules / energy 14.89 0.52 0.36
unknown conformation forces 19.56 4.13 1.00
unknown molecules / energy 15.54 3.11 2.40
unknown conformation forces 19.15 5.71 2.18
Table 1: Mean absolute errors for energy predictions in kcal/mol on the QM9 data set with given
training set size N. Best model in bold.
N SchNet DTNN [18] enn-s2s [19] enn-s2s-ens5 [19]
50,000 0.59 0.94 – –
100,000 0.34 0.84 – –
110,462 0.31 – 0.45 0.33
We include the total energy E as well as forces Fi in the training loss to train a neural network that
performs well on both properties:
`( ˆE, (E, F1, . . . , Fn)) = kE ˆEk2
+
⇢
n
nX
i=0
Fi
@ ˆE
@Ri
! 2
. (5)
This kind of loss has been used before for fitting a restricted potential energy surfaces with MLPs [36].
In our experiments, we use ⇢ = 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined
energy and force training. The value of ⇢ was optimized empirically to account for different scales of
energy and forces.
Due to the relation of energies and forces reflected in the model, we expect to see improved gen-
eralization, however, at a computational cost. As we need to perform a full forward and backward
まとめ
• Cotinuous-filter convolutional (cfconv) layerを提案
– 分⼦中の原⼦のような、ノード間の距離が不規則なグラフに対して有効な特徴抽出層
を提案した
• SchNetを提案
– cfconv layerを⽤いることで3次元空間上の任意の位置に存在する原⼦の相互作⽤を
モデリングした
• ISO17(ベンチマークデータセット)を提案
• QM9, MD17, ISO17で実験・評価し、本⼿法の有効性を確認
– ⾮平衡状態の分⼦に対するエネルギー予測タスクにおいて、原⼦間⼒の予測を学習対象に
加えることによって性能改善を実現した
– ⾮平衡状態・未知な分⼦に対して⾼い性能で予測できるよう、ロバストな学習を実現する
ことが今後の課題
32
References
• SchNet
– Schütt, Kristof, et al. "SchNet: A continuous-filter convolutional neural network for modeling
quantum interactions." Advances in Neural Information Processing Systems. 2017.
• MPNN variants
– Gilmer, Justin, et al. "Neural message passing for quantum chemistry." In Proceedings of the
34th International Conference on Machine Learning, pages 1263–1272, 2017.
– Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular
fingerprints." Advances in neural information processing systems. 2015.
– Li, Yujia, Tarlow, Daniel, Brockschmidt, Marc, and Zemel, Richard. Gated graph sequence
neural networks. ICLR, 2016.
– Schütt, Kristof T., et al. "Quantum-chemical insights from deep tensor neural networks." Nature
communications 8 (2017): 13890.
• Others
– Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. "Order matters: Sequence to sequence for
sets." ICLR, 2016.
– Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., & Müller, K. R. (2017).
Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3(5),
e1603015.
33

Más contenido relacionado

La actualidad más candente

論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"Ryohei Suzuki
 
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...Deep Learning JP
 
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?Ichigaku Takigawa
 
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナーPFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナーMatlantis
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient DescentDeep Learning JP
 
グラフデータ分析 入門編
グラフデータ分析 入門編グラフデータ分析 入門編
グラフデータ分析 入門編順也 山口
 
Matlantisに込められた 技術・思想_高本_Matlantis User Conference
Matlantisに込められた 技術・思想_高本_Matlantis User ConferenceMatlantisに込められた 技術・思想_高本_Matlantis User Conference
Matlantisに込められた 技術・思想_高本_Matlantis User ConferenceMatlantis
 
ベイズ推定の概要@広島ベイズ塾
ベイズ推定の概要@広島ベイズ塾ベイズ推定の概要@広島ベイズ塾
ベイズ推定の概要@広島ベイズ塾Yoshitake Takebayashi
 
クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式Hiroshi Nakagawa
 
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs Deep Learning JP
 
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介Deep Learning JP
 
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)narumikanno0918
 
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催Preferred Networks
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph GenerationDeep Learning JP
 
深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理Taiji Suzuki
 
グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題joisino
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...Deep Learning JP
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介Naoki Hayashi
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)Deep Learning JP
 
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learningベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learningssuserca2822
 

La actualidad más candente (20)

論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
論文紹介: "MolGAN: An implicit generative model for small molecular graphs"
 
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
[DL輪読会]Graph Convolutional Policy Network for Goal-Directed Molecular Graph G...
 
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
(2020.10) 分子のグラフ表現と機械学習: Graph Neural Networks (GNNs) とは?
 
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナーPFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー
PFP:材料探索のための汎用Neural Network Potential_中郷_20220422POLセミナー
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 
グラフデータ分析 入門編
グラフデータ分析 入門編グラフデータ分析 入門編
グラフデータ分析 入門編
 
Matlantisに込められた 技術・思想_高本_Matlantis User Conference
Matlantisに込められた 技術・思想_高本_Matlantis User ConferenceMatlantisに込められた 技術・思想_高本_Matlantis User Conference
Matlantisに込められた 技術・思想_高本_Matlantis User Conference
 
ベイズ推定の概要@広島ベイズ塾
ベイズ推定の概要@広島ベイズ塾ベイズ推定の概要@広島ベイズ塾
ベイズ推定の概要@広島ベイズ塾
 
クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式クラシックな機械学習入門:付録:よく使う線形代数の公式
クラシックな機械学習入門:付録:よく使う線形代数の公式
 
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
 
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
[DL輪読会]Convolutional Conditional Neural Processesと Neural Processes Familyの紹介
 
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
スパースモデリング、スパースコーディングとその数理(第11回WBA若手の会)
 
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
 
[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation[DL輪読会]Graph R-CNN for Scene Graph Generation
[DL輪読会]Graph R-CNN for Scene Graph Generation
 
深層学習の数理
深層学習の数理深層学習の数理
深層学習の数理
 
グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
 
ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介ベイズ統計学の概論的紹介
ベイズ統計学の概論的紹介
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learningベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
 

Similar a SchNet: A continuous-filter convolutional neural network for modeling quantum interactions

Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)sohaib_alam
 
Image encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrityImage encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrityeSAT Journals
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionIJCNCJournal
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkKazuki Fujikawa
 
Basics of edge detection and forier transform
Basics of edge detection and forier transformBasics of edge detection and forier transform
Basics of edge detection and forier transformSimranjit Singh
 
Signal Constellation, Geometric Interpretation of Signals
Signal Constellation,  Geometric Interpretation of  SignalsSignal Constellation,  Geometric Interpretation of  Signals
Signal Constellation, Geometric Interpretation of SignalsArijitDhali
 
An Artificial Neuron Implemented on an Actual Quantum Processor
An Artificial Neuron Implemented on an Actual Quantum ProcessorAn Artificial Neuron Implemented on an Actual Quantum Processor
An Artificial Neuron Implemented on an Actual Quantum ProcessorWilly Marroquin (WillyDevNET)
 
Big Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsBig Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsMohamed Seif
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal ProcessingPRABHAHARAN429
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsJonny Daenen
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMIJCSEA Journal
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural NetworksESCOM
 

Similar a SchNet: A continuous-filter convolutional neural network for modeling quantum interactions (20)

Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
 
Image encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrityImage encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrity
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistribution
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
Basics of edge detection and forier transform
Basics of edge detection and forier transformBasics of edge detection and forier transform
Basics of edge detection and forier transform
 
Signal Constellation, Geometric Interpretation of Signals
Signal Constellation,  Geometric Interpretation of  SignalsSignal Constellation,  Geometric Interpretation of  Signals
Signal Constellation, Geometric Interpretation of Signals
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
 
An Artificial Neuron Implemented on an Actual Quantum Processor
An Artificial Neuron Implemented on an Actual Quantum ProcessorAn Artificial Neuron Implemented on an Actual Quantum Processor
An Artificial Neuron Implemented on an Actual Quantum Processor
 
Dsp lab manual
Dsp lab manualDsp lab manual
Dsp lab manual
 
Analysis_molf
Analysis_molfAnalysis_molf
Analysis_molf
 
Big Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on GraphsBig Data Analysis with Signal Processing on Graphs
Big Data Analysis with Signal Processing on Graphs
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Hsieh etal spl
Hsieh etal splHsieh etal spl
Hsieh etal spl
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
 
Parallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-JoinsParallel Evaluation of Multi-Semi-Joins
Parallel Evaluation of Multi-Semi-Joins
 
Design & Analysis of Algorithms Assignment Help
Design & Analysis of Algorithms Assignment HelpDesign & Analysis of Algorithms Assignment Help
Design & Analysis of Algorithms Assignment Help
 
parallel
parallelparallel
parallel
 
Signal Processing Homework Help
Signal Processing Homework HelpSignal Processing Homework Help
Signal Processing Homework Help
 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
 
Multilayer Neural Networks
Multilayer Neural NetworksMultilayer Neural Networks
Multilayer Neural Networks
 

Más de Kazuki Fujikawa

Stanford Covid Vaccine 2nd place solution
Stanford Covid Vaccine 2nd place solutionStanford Covid Vaccine 2nd place solution
Stanford Covid Vaccine 2nd place solutionKazuki Fujikawa
 
BMS Molecular Translation 3rd place solution
BMS Molecular Translation 3rd place solutionBMS Molecular Translation 3rd place solution
BMS Molecular Translation 3rd place solutionKazuki Fujikawa
 
Kaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKazuki Fujikawa
 
Kaggle参加報告: Quora Insincere Questions Classification
Kaggle参加報告: Quora Insincere Questions ClassificationKaggle参加報告: Quora Insincere Questions Classification
Kaggle参加報告: Quora Insincere Questions ClassificationKazuki Fujikawa
 
Ordered neurons integrating tree structures into recurrent neural networks
Ordered neurons integrating tree structures into recurrent neural networksOrdered neurons integrating tree structures into recurrent neural networks
Ordered neurons integrating tree structures into recurrent neural networksKazuki Fujikawa
 
A closer look at few shot classification
A closer look at few shot classificationA closer look at few shot classification
A closer look at few shot classificationKazuki Fujikawa
 
Graph convolutional policy network for goal directed molecular graph generation
Graph convolutional policy network for goal directed molecular graph generationGraph convolutional policy network for goal directed molecular graph generation
Graph convolutional policy network for goal directed molecular graph generationKazuki Fujikawa
 
Conditional neural processes
Conditional neural processesConditional neural processes
Conditional neural processesKazuki Fujikawa
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
Matrix capsules with em routing
Matrix capsules with em routingMatrix capsules with em routing
Matrix capsules with em routingKazuki Fujikawa
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learningKazuki Fujikawa
 
DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用Kazuki Fujikawa
 

Más de Kazuki Fujikawa (14)

Stanford Covid Vaccine 2nd place solution
Stanford Covid Vaccine 2nd place solutionStanford Covid Vaccine 2nd place solution
Stanford Covid Vaccine 2nd place solution
 
BMS Molecular Translation 3rd place solution
BMS Molecular Translation 3rd place solutionBMS Molecular Translation 3rd place solution
BMS Molecular Translation 3rd place solution
 
ACL2020 best papers
ACL2020 best papersACL2020 best papers
ACL2020 best papers
 
Kaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular PropertiesKaggle参加報告: Champs Predicting Molecular Properties
Kaggle参加報告: Champs Predicting Molecular Properties
 
NLP@ICLR2019
NLP@ICLR2019NLP@ICLR2019
NLP@ICLR2019
 
Kaggle参加報告: Quora Insincere Questions Classification
Kaggle参加報告: Quora Insincere Questions ClassificationKaggle参加報告: Quora Insincere Questions Classification
Kaggle参加報告: Quora Insincere Questions Classification
 
Ordered neurons integrating tree structures into recurrent neural networks
Ordered neurons integrating tree structures into recurrent neural networksOrdered neurons integrating tree structures into recurrent neural networks
Ordered neurons integrating tree structures into recurrent neural networks
 
A closer look at few shot classification
A closer look at few shot classificationA closer look at few shot classification
A closer look at few shot classification
 
Graph convolutional policy network for goal directed molecular graph generation
Graph convolutional policy network for goal directed molecular graph generationGraph convolutional policy network for goal directed molecular graph generation
Graph convolutional policy network for goal directed molecular graph generation
 
Conditional neural processes
Conditional neural processesConditional neural processes
Conditional neural processes
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Matrix capsules with em routing
Matrix capsules with em routingMatrix capsules with em routing
Matrix capsules with em routing
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用
 

Último

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 

Último (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

SchNet: A continuous-filter convolutional neural network for modeling quantum interactions

  • 1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ SchNet: A continuous-filter convolutional neural network for modeling quantum interactions Kazuki Fujikawa, DeNA
  • 2. サマリ • 書誌情報 – NIPS2017 – Schütt, K., Kindermans, P. J., Felix, H. E. S., Chmiela, S., Tkatchenko, A., and Müller, K. R. • 概要 – Graph Convolution に関する論⽂ – グラフの接続情報を使わず、ノード間の距離を使って畳み込みを⾏う • 3次元空間上で任意に位置するノードとの相互作⽤をモデリングすることが可能 • 下記のようなケースで特に有効 – 同じグラフ構造でも異なる配置が存在し、それにより特性が変化するケース – グラフ上の距離と実空間上の距離に乖離が⽣じるケース 2
  • 3. アウトライン • 背景 • 関連研究 – Message Passing Neural Networkとその派⽣ • 提案⼿法 – Continuous-filter convolutional layer – Interaction block – SchNet • 実験・結果 3
  • 4. アウトライン • 背景 • 関連研究 – Message Passing Neural Networkとその派⽣ • 提案⼿法 – Continuous-filter convolutional layer – Interaction block – SchNet • 実験・結果 4
  • 5. 背景 • 創薬や材料化学の分野における最適分⼦の探索において、物性は重要な情報 – DFT(Density Functional Theory)などによる近似がよく利⽤される – ⾮常に計算コストが⼤きく、⼗分な探索ができない課題があった • 機械学習モデルで物性を⾼速かつ正確に予測できると有⽤ – DFTで得たデータを教師データとして、機械学習で物性を予測するタスクが近年盛んに ⾏われている 5 ssing for Quantum Chemistry 1 Patrick F. Riley 2 Oriol Vinyals 3 George E. Dahl 1 - - - k e e d f t f l m DFT 103 seconds Message Passing Neural Net 10 2 seconds E,!0, ... Targets Figure 1. A Message Passing Neural Network predicts quantum 図引⽤: Gilmer+, ICML2017
  • 6. 背景 • 画像認識 / ⾃然⾔語処理などの分野で深層学習は発展 – CNN / RNN による効率的な特徴抽出法が貢献 – 画像 / テキストは規則正しいグラフと考えられる • 分⼦のグラフに対して同じ⽅法を適⽤することは困難 – ノードの次数が⼀定ではない、エッジには属性が付いている、etc. 6 図引⽤: 機は熟した!グラフ構造に対する Deep Learning、Graph Convolutionのご紹介 (http://tech-blog.abeja.asia/entry/2017/04/27/105613) 図引⽤: wikipedia (https://ja.wikipedia.org/wiki/酢酸)
  • 7. アウトライン • 背景 • 関連研究 – Message Passing Neural Networkとその派⽣ • 提案⼿法 – Continuous-filter convolutional layer – Interaction block – SchNet • 実験・結果 7
  • 8. 関連研究 • Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017] – ノードの次数が不規則なグラフに対して有効な特徴抽出法をGilmerらが⼀般化 – 各層では、各ノードに割り当てられた特徴ベクトルを、隣接するノードやエッジの 特徴ベクトルを使って更新する – 上記をL層繰り返すと各ノードの特徴ベクトルはL近傍のノードやエッジの情報を反映した ものとなる 8 ntum Chemistry Oriol Vinyals 3 George E. Dahl 1 DFT 103 seconds Message Passing Neural Net 10 2 seconds E,!0, ... Targets Message Passing Neural Network predicts quantum f an organic molecule by modeling a computationally DFT calculation. 図引⽤: Gilmer+, ICML2017
  • 9. 関連研究 • Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017] – Message passing phase • Message function: 𝑀"(ℎ% " , ℎ' " , 𝑒%') – 各ノードが隣接するノードに対して伝搬させる情報を作成する • Update function: 𝑈" ℎ% " , 𝑚% "-. – 各ノードが隣接するノードから情報を貰い、⾃分⾃⾝の情報を更新する 9 v u1 u2 h(0) v h(0) u1 h(0) u2 Message Function: 𝑀"(ℎ% " , ℎ'/ " , 𝑒%'/ ) Σ Message Function: 𝑀"(ℎ% " , ℎ'0 " , 𝑒%'0 ) Update Function: 𝑈"(ℎ% " , 𝑚% "-. ) Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message Recurrent Unit introduced in Cho et al. (2 used weight tying, so the same update fu each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j( where i and j are neural networks, and wise multiplication. Interaction Networks, Battaglia et al. (2 This work considered both the case whe get at each node in the graph, and where level target. It also considered the case node level effects applied at each time case the update function takes as input th (hv, xv, mv) where xv is an external vec some outside influence on the vertex v. Th tion M(hv, hw, evw) is a neural network concatenation (hv, hw, evw). The vertex U(hv, xv, mv) is a neural network whic the concatenation (hv, xv, mv). Finally, i there is a graph level output, R = f( P Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message function Mt, vertex update function Ut, and readout func- tion R used. Note one could also learn edge features in an MPNN by introducing hidden states for all edges in the graph ht evw and updating them analogously to equations 1 and 2. Of the existing MPNNs, only Kearnes et al. (2016) has used this idea. Recurrent Unit introduced in Cho et al. (2014). This work used weight tying, so the same update function is used at each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j(h(T ) v ) ⌘ (4) where i and j are neural networks, and denotes element- wise multiplication. Interaction Networks, Battaglia et al. (2016) This work considered both the case where there is a tar- get at each node in the graph, and where there is a graph level target. It also considered the case where there are node level effects applied at each time step, in such a case the update function takes as input the concatenation (hv, xv, mv) where xv is an external vector representing some outside influence on the vertex v. The message func- tion M(hv, hw, evw) is a neural network which takes the concatenation (hv, hw, evw). The vertex update function U(hv, xv, mv) is a neural network which takes as input the concatenation (hv, xv, mv). Finally, in the case where there is a graph level output, R = f( P v2G hT v ) where f is a neural network which takes the sum of the final hidden states hT v . Note the original work only defined the model for T = 1. Molecular Graph Convolutions, Kearnes et al. (2016) 𝑒%'/ 𝑒%'0
  • 10. 関連研究 • Message Passing Neural Networks(MPNN) [Gilmer+, ICML2017] – Readout phase • Readout function: 𝑅({ℎ% (3) |𝑣 ∈ 𝐺}) – Message passing phaseを経て得られた各ノードの情報を統合し、 グラフ全体に対して⼀つの情報を得る 10
  • 11. 関連研究 • CNN for Learning Molecular Fingerprints [Duvenaud+, NIPS2015] – Message passing phase • Message function: 𝑀" ℎ% " , ℎ' " , 𝑒%' = 𝑐𝑜𝑛𝑐𝑎𝑡(ℎ' " , 𝑒%') • Update function: 𝑈" ℎ% " , 𝑚% "-. = 𝜎 𝐻" ABC % 𝑚% "-. – 𝐻" ABC (%) : step 𝑡、頂点 𝑣 における次数 deg ( 𝑣) ごとに準備された重み 11 v u1 u2 h(0) v h(0) u1 h(0) u2 Message Function: 𝑀"(ℎ% " , ℎ'/ " , 𝑒%'/ ) Σ Message Function: 𝑀"(ℎ% " , ℎ'0 " , 𝑒%'0 ) Update Function: 𝑈"(ℎ% " , 𝑚% "-. ) Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message Recurrent Unit introduced in Cho et al. (2 used weight tying, so the same update fu each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j( where i and j are neural networks, and wise multiplication. Interaction Networks, Battaglia et al. (2 This work considered both the case whe get at each node in the graph, and where level target. It also considered the case node level effects applied at each time case the update function takes as input th (hv, xv, mv) where xv is an external vec some outside influence on the vertex v. Th tion M(hv, hw, evw) is a neural network concatenation (hv, hw, evw). The vertex U(hv, xv, mv) is a neural network whic the concatenation (hv, xv, mv). Finally, i there is a graph level output, R = f( P Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message function Mt, vertex update function Ut, and readout func- tion R used. Note one could also learn edge features in an MPNN by introducing hidden states for all edges in the graph ht evw and updating them analogously to equations 1 and 2. Of the existing MPNNs, only Kearnes et al. (2016) has used this idea. Recurrent Unit introduced in Cho et al. (2014). This work used weight tying, so the same update function is used at each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j(h(T ) v ) ⌘ (4) where i and j are neural networks, and denotes element- wise multiplication. Interaction Networks, Battaglia et al. (2016) This work considered both the case where there is a tar- get at each node in the graph, and where there is a graph level target. It also considered the case where there are node level effects applied at each time step, in such a case the update function takes as input the concatenation (hv, xv, mv) where xv is an external vector representing some outside influence on the vertex v. The message func- tion M(hv, hw, evw) is a neural network which takes the concatenation (hv, hw, evw). The vertex update function U(hv, xv, mv) is a neural network which takes as input the concatenation (hv, xv, mv). Finally, in the case where there is a graph level output, R = f( P v2G hT v ) where f is a neural network which takes the sum of the final hidden states hT v . Note the original work only defined the model for T = 1. Molecular Graph Convolutions, Kearnes et al. (2016) 𝑒%'/ 𝑒%'0
  • 12. 関連研究 • CNN for Learning Molecular Fingerprints [Duvenaud+, NIPS2015] – Readout phase • Readout function: 𝑅 ℎ% 3 𝑣 ∈ 𝐺 = 𝑓( ∑ 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑊"ℎ% " %," ) 12
  • 13. 関連研究 • Gated Graph Neural Networks (GG-NN) [Li+, ICLR2016] – Message passing phase – Message function: 𝑀" ℎ% " , ℎ' " , 𝑒%' = 𝐴N'ℎ' " » 𝐴N': エッジの種類(単結合、⼆重結合、etc.)ごとに定義された重み – Update function: 𝑈" ℎ% " , 𝑚% "-. = GRU ℎ% " , 𝑚% "-. 13 v u1 u2 h(0) v h(0) u1 h(0) u2 Message Function: 𝑀"(ℎ% " , ℎ'/ " , 𝑒%'/ ) Σ Message Function: 𝑀"(ℎ% " , ℎ'0 " , 𝑒%'0 ) Update Function: 𝑈"(ℎ% " , 𝑚% "-. ) Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message Recurrent Unit introduced in Cho et al. (2 used weight tying, so the same update fu each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j( where i and j are neural networks, and wise multiplication. Interaction Networks, Battaglia et al. (2 This work considered both the case whe get at each node in the graph, and where level target. It also considered the case node level effects applied at each time case the update function takes as input th (hv, xv, mv) where xv is an external vec some outside influence on the vertex v. Th tion M(hv, hw, evw) is a neural network concatenation (hv, hw, evw). The vertex U(hv, xv, mv) is a neural network whic the concatenation (hv, xv, mv). Finally, i there is a graph level output, R = f( P Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message function Mt, vertex update function Ut, and readout func- tion R used. Note one could also learn edge features in an MPNN by introducing hidden states for all edges in the graph ht evw and updating them analogously to equations 1 and 2. Of the existing MPNNs, only Kearnes et al. (2016) has used this idea. Recurrent Unit introduced in Cho et al. (2014). This work used weight tying, so the same update function is used at each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j(h(T ) v ) ⌘ (4) where i and j are neural networks, and denotes element- wise multiplication. Interaction Networks, Battaglia et al. (2016) This work considered both the case where there is a tar- get at each node in the graph, and where there is a graph level target. It also considered the case where there are node level effects applied at each time step, in such a case the update function takes as input the concatenation (hv, xv, mv) where xv is an external vector representing some outside influence on the vertex v. The message func- tion M(hv, hw, evw) is a neural network which takes the concatenation (hv, hw, evw). The vertex update function U(hv, xv, mv) is a neural network which takes as input the concatenation (hv, xv, mv). Finally, in the case where there is a graph level output, R = f( P v2G hT v ) where f is a neural network which takes the sum of the final hidden states hT v . Note the original work only defined the model for T = 1. Molecular Graph Convolutions, Kearnes et al. (2016) 𝑒%'/ 𝑒%'0
  • 14. 関連研究 • Gated Graph Neural Networks (GG-NN) [Li+, ICLR2016] – Readout phase • Readout function: – 𝑅 ℎ% 3 𝑣 ∈ 𝐺 = tanh( ∑ 𝜎% 𝑖 ℎ% 3 , ℎ% W ⊙ tanh 𝑗 ℎ% 3 , ℎ% W ) – 𝑖, 𝑗: NNの重み、 𝜎 𝑖 ℎ% 3 , ℎ% W : soft attentionの役割 14
  • 15. 関連研究 • Deep Tensor Neural Networks (DTNN) [Schütt+, Nature2017] – Message passing phase • Message function: 𝑀" ℎ% " , ℎ' " , 𝑒%' = tanh 𝑊Z[ 𝑊[Zℎ " + 𝑏. ⊙ 𝑊_Z 𝑒% + 𝑏` – 𝑊Z[ , 𝑊[Z , 𝑊_Z : それぞれ共有重み、𝑏., 𝑏`: バイアス項 • Update function: 𝑈" ℎ% " , 𝑚% "-. = ℎ% " + 𝑚% "-. 15 v u1 u2 h(0) v h(0) u1 h(0) u2 Message Function: 𝑀"(ℎ% " , ℎ'/ " , 𝑒%'/ ) Σ Message Function: 𝑀"(ℎ% " , ℎ'0 " , 𝑒%'0 ) Update Function: 𝑈"(ℎ% " , 𝑚% "-. ) Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message Recurrent Unit introduced in Cho et al. (2 used weight tying, so the same update fu each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j( where i and j are neural networks, and wise multiplication. Interaction Networks, Battaglia et al. (2 This work considered both the case whe get at each node in the graph, and where level target. It also considered the case node level effects applied at each time case the update function takes as input th (hv, xv, mv) where xv is an external vec some outside influence on the vertex v. Th tion M(hv, hw, evw) is a neural network concatenation (hv, hw, evw). The vertex U(hv, xv, mv) is a neural network whic the concatenation (hv, xv, mv). Finally, i there is a graph level output, R = f( P Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message function Mt, vertex update function Ut, and readout func- tion R used. Note one could also learn edge features in an MPNN by introducing hidden states for all edges in the graph ht evw and updating them analogously to equations 1 and 2. Of the existing MPNNs, only Kearnes et al. (2016) has used this idea. Recurrent Unit introduced in Cho et al. (2014). This work used weight tying, so the same update function is used at each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j(h(T ) v ) ⌘ (4) where i and j are neural networks, and denotes element- wise multiplication. Interaction Networks, Battaglia et al. (2016) This work considered both the case where there is a tar- get at each node in the graph, and where there is a graph level target. It also considered the case where there are node level effects applied at each time step, in such a case the update function takes as input the concatenation (hv, xv, mv) where xv is an external vector representing some outside influence on the vertex v. The message func- tion M(hv, hw, evw) is a neural network which takes the concatenation (hv, hw, evw). The vertex update function U(hv, xv, mv) is a neural network which takes as input the concatenation (hv, xv, mv). Finally, in the case where there is a graph level output, R = f( P v2G hT v ) where f is a neural network which takes the sum of the final hidden states hT v . Note the original work only defined the model for T = 1. Molecular Graph Convolutions, Kearnes et al. (2016) 𝑒%'/ 𝑒%'0
  • 16. 関連研究 • Deep Tensor Neural Networks (DTNN) [Schütt+, Nature2017] – Readout phase • Readout function: 𝑅 ℎ% 3 𝑣 ∈ 𝐺 = ∑ NN(ℎ% 3 )% 16
  • 17. 関連研究 • Edge Network + Set2Set (enn-s2s) [Gilmer+, ICML2017] – Message passing phase • Message function: 𝑀" ℎ% " , ℎ' " , 𝑒%' = 𝐴(𝑒%)ℎ' " – 𝐴(𝑒%): エッジベクトル 𝑒% を変換するNN • Update function: 𝑈" ℎ% " , 𝑚% "-. = GRU ℎ% " , 𝑚% "-. – GGNN [Li+, ICLR2016] と同様 17 v u1 u2 h(0) v h(0) u1 h(0) u2 Message Function: 𝑀"(ℎ% " , ℎ'/ " , 𝑒%'/ ) Σ Message Function: 𝑀"(ℎ% " , ℎ'0 " , 𝑒%'0 ) Update Function: 𝑈"(ℎ% " , 𝑚% "-. ) Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message Recurrent Unit introduced in Cho et al. (2 used weight tying, so the same update fu each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j( where i and j are neural networks, and wise multiplication. Interaction Networks, Battaglia et al. (2 This work considered both the case whe get at each node in the graph, and where level target. It also considered the case node level effects applied at each time case the update function takes as input th (hv, xv, mv) where xv is an external vec some outside influence on the vertex v. Th tion M(hv, hw, evw) is a neural network concatenation (hv, hw, evw). The vertex U(hv, xv, mv) is a neural network whic the concatenation (hv, xv, mv). Finally, i there is a graph level output, R = f( P Neural Message Passing for Quantum Chemistry time steps and is defined in terms of message functions Mt and vertex update functions Ut. During the message pass- ing phase, hidden states ht v at each node in the graph are updated based on messages mt+1 v according to mt+1 v = X w2N(v) Mt(ht v, ht w, evw) (1) ht+1 v = Ut(ht v, mt+1 v ) (2) where in the sum, N(v) denotes the neighbors of v in graph G. The readout phase computes a feature vector for the whole graph using some readout function R according to ˆy = R({hT v | v 2 G}). (3) The message functions Mt, vertex update functions Ut, and readout function R are all learned differentiable functions. R operates on the set of node states and must be invariant to permutations of the node states in order for the MPNN to be invariant to graph isomorphism. In what follows, we define previous models in the literature by specifying the message function Mt, vertex update function Ut, and readout func- tion R used. Note one could also learn edge features in an MPNN by introducing hidden states for all edges in the graph ht evw and updating them analogously to equations 1 and 2. Of the existing MPNNs, only Kearnes et al. (2016) has used this idea. Recurrent Unit introduced in Cho et al. (2014). This work used weight tying, so the same update function is used at each time step t. Finally, R = X v2V ⇣ i(h(T ) v , h0 v) ⌘ ⇣ j(h(T ) v ) ⌘ (4) where i and j are neural networks, and denotes element- wise multiplication. Interaction Networks, Battaglia et al. (2016) This work considered both the case where there is a tar- get at each node in the graph, and where there is a graph level target. It also considered the case where there are node level effects applied at each time step, in such a case the update function takes as input the concatenation (hv, xv, mv) where xv is an external vector representing some outside influence on the vertex v. The message func- tion M(hv, hw, evw) is a neural network which takes the concatenation (hv, hw, evw). The vertex update function U(hv, xv, mv) is a neural network which takes as input the concatenation (hv, xv, mv). Finally, in the case where there is a graph level output, R = f( P v2G hT v ) where f is a neural network which takes the sum of the final hidden states hT v . Note the original work only defined the model for T = 1. Molecular Graph Convolutions, Kearnes et al. (2016) 𝑒%'/ 𝑒%'0
  • 18. 関連研究 • Edge Network + Set2Set (enn-s2s) [Gilmer+, ICML2017] – Readout phase • Readout function: 𝑅 ℎ% 3 𝑣 ∈ 𝐺 = set2set ℎ% 3 𝑣 ∈ 𝐺 – set2set [Vinyals+, ICLR2016] によって作成された𝑞" ∗ を後のNNの⼊⼒にする – 他にも⼊⼒特徴の作り⽅などに⼯夫あり 18 whilst preserving the right properties which we just discussed: a memory that increases with the size of the set, and which is order invariant. In the next sections, we explain such a modification, which could also be seen as a special case of a Memory Network (Weston et al., 2015) or Neural Turing Machine (Graves et al., 2014) – with a computation flow as depicted in Figure 1. 4.2 ATTENTION MECHANISMS Neural models with memories coupled to differentiable addressing mechanism have been success- fully applied to handwriting generation and recognition (Graves, 2012), machine translation (Bah- danau et al., 2015a), and more general computation machines (Graves et al., 2014; Weston et al., 2015). Since we are interested in associative memories we employed a “content” based attention. This has the property that the vector retrieved from our memory would not change if we randomly shuffled the memory. This is crucial for proper treatment of the input set X as such. In particular, our process block based on an attention mechanism uses the following: qt = LSTM(q⇤ t 1) (3) ei,t = f(mi, qt) (4) ai,t = exp(ei,t) P j exp(ej,t) (5) rt = X i ai,tmi (6) q⇤ t = [qt rt] (7) Read Process Write Figure 1: The Read-Process-and-Write model. where i indexes through each memory vector mi (typically equal to the cardinality of X), qt is a query vector which allows us to read rt from the memories, f is a function that computes a single scalar from mi and qt (e.g., a dot product), and LSTM is an LSTM which computes a recurrent state but which takes no inputs. q⇤ is the state which this LSTM evolves, and is formed 図引⽤: Vinyals+, ICLR2016
  • 19. 関連研究 • 既存⼿法の限界 – ノード間の相互作⽤をグラフ上での距離のみに基づいてモデリングしている • グラフ構造は同じでも異なる配置が存在し、それにより予測対象の傾向が変わる場合 を想定していない • グラフ上の距離は遠いが実空間では近いような場合に、相互作⽤を適切にモデリング できない可能性がある 19 図引⽤: https://en.wikipedia.org/wiki/Conformational_isomerism
  • 20. アウトライン • 背景 • 関連研究 – Message Passing Neural Networkとその派⽣ • 提案⼿法 – Continuous-filter convolutional layer – Interaction block – SchNet • 実験・結果 20
  • 21. 提案⼿法: Continuous-filter convolution (cfconv) • ノード間の距離を利⽤して重み付けするフィルタ – “重要視したい距離” を学習で求める 21 (left), the interaction block (middle) work (right). The shifted softplus is Zi 3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h (a) 1st interaction block (b) 2nd interaction block (c) 3rd interaction block Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red. Filter-generating networks The cfconv layer including its filter-generating network are depicted at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies, we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is obtained by using interatomic distances dij = kri rjk as input for the filter network. Without further processing, the filters would be highly correlated since a neural network after initialization is close to linear. This leads to a plateau at the beginning of training that is hard to overcome. We avoid this by expanding the distance with radial basis functions ek(ri rj) = exp( kdij µkk2 ) located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds to reducing the resolution of the filter, while restricting the range of the centers corresponds to the filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is left for future work. We feed the expanded distances into two dense layers with softplus activations to compute the filter weight W(ri rj) as shown in Fig. 2 (right). Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of interatomic distances. This enables its interaction block to update the representations according to the radial environment of each atom. The sequential updates from three interaction blocks allow SchNet to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping rotational invariance due to the radial filters. 4.2 Training with energies and forces As described above, the interatomic forces are related to the molecular energy, so that we can obtain an energy-conserving force model by differentiating the energy model w.r.t. the atom positions ˆ @ ˆE dense + shifted softplus embed Zj’ Zj dense + shifted softplus × × embed +
  • 22. 提案⼿法: Continuous-filter convolution (cfconv) • ノード間の距離を利⽤して重み付けするフィルタ – “重要視したい距離” を学習で求める 22 (left), the interaction block (middle) work (right). The shifted softplus is Zi 3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h (a) 1st interaction block (b) 2nd interaction block (c) 3rd interaction block Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red. Filter-generating networks The cfconv layer including its filter-generating network are depicted at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies, we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is obtained by using interatomic distances dij = kri rjk as input for the filter network. Without further processing, the filters would be highly correlated since a neural network after initialization is close to linear. This leads to a plateau at the beginning of training that is hard to overcome. We avoid this by expanding the distance with radial basis functions ek(ri rj) = exp( kdij µkk2 ) located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds to reducing the resolution of the filter, while restricting the range of the centers corresponds to the filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is left for future work. We feed the expanded distances into two dense layers with softplus activations to compute the filter weight W(ri rj) as shown in Fig. 2 (right). Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of interatomic distances. This enables its interaction block to update the representations according to the radial environment of each atom. The sequential updates from three interaction blocks allow SchNet to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping rotational invariance due to the radial filters. 4.2 Training with energies and forces As described above, the interatomic forces are related to the molecular energy, so that we can obtain an energy-conserving force model by differentiating the energy model w.r.t. the atom positions ˆ @ ˆE dense + shifted softplus embed Zj’ Zj dense + shifted softplus × × embed + Filter-generating Networks 𝜇. = 0.1Å, 𝜇` = 0.2Å, … 𝜇qWW = 30Å , 𝛾 = 10Åでrbfカーネルを300個⽤意 ⇓ 𝑑ghに最も近い𝜇を持つカーネルは 1に近づき、遠ざかるに従い0に近づく (ソフトな1-hot表現が得られる)
  • 23. 提案⼿法: Continuous-filter convolution (cfconv) • ノード間の距離を利⽤して重み付けするフィルタ – “重要視したい距離” を学習で求める 23 (left), the interaction block (middle) work (right). The shifted softplus is Zi 3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h (a) 1st interaction block (b) 2nd interaction block (c) 3rd interaction block Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red. Filter-generating networks The cfconv layer including its filter-generating network are depicted at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies, we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is obtained by using interatomic distances dij = kri rjk as input for the filter network. Without further processing, the filters would be highly correlated since a neural network after initialization is close to linear. This leads to a plateau at the beginning of training that is hard to overcome. We avoid this by expanding the distance with radial basis functions ek(ri rj) = exp( kdij µkk2 ) located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds to reducing the resolution of the filter, while restricting the range of the centers corresponds to the filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is left for future work. We feed the expanded distances into two dense layers with softplus activations to compute the filter weight W(ri rj) as shown in Fig. 2 (right). Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of interatomic distances. This enables its interaction block to update the representations according to the radial environment of each atom. The sequential updates from three interaction blocks allow SchNet to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping rotational invariance due to the radial filters. 4.2 Training with energies and forces As described above, the interatomic forces are related to the molecular energy, so that we can obtain an energy-conserving force model by differentiating the energy model w.r.t. the atom positions ˆ @ ˆE dense + shifted softplus embed Zj’ Zj dense + shifted softplus × × embed + Filter-generating Networks 出⼒ベクトルとノードのembed vector の要素積を取る ⇓ 各ユニットのactivationでノードの embed vectorをフィルタする
  • 24. 提案⼿法: Continuous-filter convolution (cfconv) • ノード間の距離情報を利⽤して重み付けするフィルタ – “重要視したい距離” を学習で求める 24 (left), the interaction block (middle) work (right). The shifted softplus is Zi 3.7embed 7.2 𝑑gh = 𝐫g − 𝐫h (a) 1st interaction block (b) 2nd interaction block (c) 3rd interaction block Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red. Filter-generating networks The cfconv layer including its filter-generating network are depicted at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies, we restrict our filters for the cfconv layers to be rotationally invariant. The rotational invariance is obtained by using interatomic distances dij = kri rjk as input for the filter network. Without further processing, the filters would be highly correlated since a neural network after initialization is close to linear. This leads to a plateau at the beginning of training that is hard to overcome. We avoid this by expanding the distance with radial basis functions ek(ri rj) = exp( kdij µkk2 ) located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds to reducing the resolution of the filter, while restricting the range of the centers corresponds to the filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is left for future work. We feed the expanded distances into two dense layers with softplus activations to compute the filter weight W(ri rj) as shown in Fig. 2 (right). Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of interatomic distances. This enables its interaction block to update the representations according to the radial environment of each atom. The sequential updates from three interaction blocks allow SchNet to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping rotational invariance due to the radial filters. 4.2 Training with energies and forces As described above, the interatomic forces are related to the molecular energy, so that we can obtain an energy-conserving force model by differentiating the energy model w.r.t. the atom positions ˆ @ ˆE dense + shifted softplus embed Zj’ Zj dense + shifted softplus × × embed +
  • 25. 提案⼿法: Interaction block • cfconv layerを含むmessage passing layer – cfconv layerでノード間の相互作⽤を考慮して各ノードの特徴ベクトルを更新 – ノード距離に制限無く相互作⽤を表現することが可能(DTNNなどとの相違点) 25chNet with an architectural overview (left), the interaction block (middle) convolution with filter-generating network (right). The shifted softplus is (a) 1st interaction block (b) 2nd interaction block (c) 3rd interaction block Figure 3: 10x10 Å cuts through all 64 radial, three-dimensional filters in each interaction block of SchNet trained on molecular dynamics of ethanol. Negative values are blue, positive values are red. Filter-generating networks The cfconv layer including its filter-generating network are depicted at the right panel of Fig. 2. In order to satisfy the requirements for modeling molecular energies,
  • 26. 提案⼿法: SchNet • interaction, atom-wise を挟み、最終的に各原⼦ごとに1次元のスカラー値を 出⼒する • 出⼒されたスカラー値を原⼦数分⾜し合わせて分⼦全体の予測結果を得る 26 Figure 2: Illustration of SchNet with an architectural overview (left), the interaction block (middle) and the continuous-filter convolution with filter-generating network (right). The shifted softplus is defined as ssp(x) = ln(0.5ex + 0.5).
  • 27. 提案⼿法: Loss • ⼆種類のロスを⾜した損失関数を定義 – • エネルギーの予測に関する⼆乗誤差 – • 原⼦間⼒の予測に関する⼆乗誤差を原⼦毎に求め、⾜し合わせたもの • 𝜌: 原⼦間⼒を重要視する度合いを表すハイパーパラメータ • 原⼦間⼒の予測値は下記により計算で求める [Chmiela+, 2017] 27 110,462 0.31 – 0.45 0.33 We include the total energy E as well as forces Fi in the training loss to train a neural network that performs well on both properties: `( ˆE, (E, F1, . . . , Fn)) = kE ˆEk2 + ⇢ n nX i=0 Fi @ ˆE @Ri ! 2 . (5) This kind of loss has been used before for fitting a restricted potential energy surfaces with MLPs [36]. In our experiments, we use ⇢ = 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined energy and force training. The value of ⇢ was optimized empirically to account for different scales of energy and forces. Due to the relation of energies and forces reflected in the model, we expect to see improved gen- eralization, however, at a computational cost. As we need to perform a full forward and backward pass on the energy model to obtain the forces, the resulting force model is twice as deep and, hence, requires about twice the amount of computation time. Even though the GDML model captures this relationship between energies and forces, it is explicitly optimized to predict the force field while the energy prediction is a by-product. Models such as circular fingerprints [15], molecular graph convolutions or message-passing neural networks[19] for property prediction across chemical compound space are only concerned with equilibrium molecules, i.e., the special case where the forces are vanishing. They can not be trained with forces in a similar manner, as they include discontinuities in their predicted potential energy surface caused by discrete binning or the use of one-hot encoded bond type information. 0.34 0.84 – – 0.31 – 0.45 0.33 as well as forces Fi in the training loss to train a neural network that es: . , Fn)) = kE ˆEk2 + ⇢ n nX i=0 Fi @ ˆE @Ri ! 2 . (5) before for fitting a restricted potential energy surfaces with MLPs [36]. = 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined value of ⇢ was optimized empirically to account for different scales of s and forces reflected in the model, we expect to see improved gen- putational cost. As we need to perform a full forward and backward btain the forces, the resulting force model is twice as deep and, hence, nt of computation time. captures this relationship between energies and forces, it is explicitly field while the energy prediction is a by-product. Models such as cular graph convolutions or message-passing neural networks[19] for mical compound space are only concerned with equilibrium molecules, y predictions in kcal/mol on the QM9 data set with given NN [18] enn-s2s [19] enn-s2s-ens5 [19] 0.94 – – 0.84 – – – 0.45 0.33 forces Fi in the training loss to train a neural network that kE ˆEk2 + ⇢ n nX i=0 Fi @ ˆE @Ri ! 2 . (5) fitting a restricted potential energy surfaces with MLPs [36]. 5 for pure energy based training and ⇢ = 100 for combined was optimized empirically to account for different scales of es reflected in the model, we expect to see improved gen- cost. As we need to perform a full forward and backward rces, the resulting force model is twice as deep and, hence, utation time. his relationship between energies and forces, it is explicitly e the energy prediction is a by-product. Models such as h convolutions or message-passing neural networks[19] for ound space are only concerned with equilibrium molecules, ek(ri rj) = exp( kdij µkk2 ) located at centers 0Å  µk  30Å every 0.1Å with = 10Å. This is chosen such that all distances occurring in the data sets are covered by the filters. Due to this additional non-linearity, the initial filters are less correlated leading to a faster training procedure. Choosing fewer centers corresponds to reducing the resolution of the filter, while restricting the range of the centers corresponds to the filter size in a usual convolutional layer. An extensive evaluation of the impact of these variables is left for future work. We feed the expanded distances into two dense layers with softplus activations to compute the filter weight W(ri rj) as shown in Fig. 2 (right). Fig 3 shows 2d-cuts through generated filters for all three interaction blocks of SchNet trained on an ethanol molecular dynamics trajectory. We observe how each filter emphasizes certain ranges of interatomic distances. This enables its interaction block to update the representations according to the radial environment of each atom. The sequential updates from three interaction blocks allow SchNet to construct highly complex many-body representations in the spirit of DTNNs [18] while keeping rotational invariance due to the radial filters. 4.2 Training with energies and forces As described above, the interatomic forces are related to the molecular energy, so that we can obtain an energy-conserving force model by differentiating the energy model w.r.t. the atom positions ˆFi(Z1, . . . , Zn, r1, . . . , rn) = @ ˆE @ri (Z1, . . . , Zn, r1, . . . , rn). (4) Chmiela et al. [17] pointed out that this leads to an energy-conserving force-field by construction. As SchNet yields rotationally invariant energy predictions, the force predictions are rotationally equivariant by construction. The model has to be at least twice differentiable to allow for gradient descent of the force loss. We chose a shifted softplus ssp(x) = ln(0.5ex + 0.5) as non-linearity
  • 28. アウトライン • 背景 • 関連研究 – Message Passing Neural Networkとその派⽣ • 提案⼿法 – Continuous-filter convolutional layer – Interaction block – SchNet • 実験・結果 28
  • 29. 実験: QM9 • DFTで算出された分⼦の17種の物性値を含むデータセット – そのうち⼀つの物性: U0(絶対零度での分⼦全体のエネルギー)のみを予測対象とする – 平衡状態で原⼦間⼒はゼロであり、予測する必要が無い • ⽐較⼿法 – DTNN [Schütt+, Nature2017], enn-s2s [Gilmer+, ICML2017], enn-s2s-ens5(enn-s2sのアンサンブル) • 実験結果 – SchNetが⼀貫してSOTAの結果を⽰した – 訓練データ 110k 件でMean Absolute Errorが 0.31kcal/mol だった 29 Table 1: Mean absolute errors for energy predictions in kcal/mol on the QM9 data set with given training set size N. Best model in bold. N SchNet DTNN [18] enn-s2s [19] enn-s2s-ens5 [19] 50,000 0.59 0.94 – – 100,000 0.34 0.84 – – 110,462 0.31 – 0.45 0.33 We include the total energy E as well as forces Fi in the training loss to train a neural network that performs well on both properties: ⇢ nX @ ˆE ! 2
  • 30. 実験: MD17 • Molecular Dynamics (MD) シミュレーションを⾏ったデータセット – ⼀つの分⼦(ベンゼンなど)に関する軌跡データ • 8種の分⼦に関してデータを取り、別タスクとして学習する • 同じ分⼦でもサンプルによって位置やエネルギー、原⼦間⼒が異なる – 分⼦全体のエネルギー、原⼦間⼒をそれぞれ予測し、Mean Absolute Errorで評価 • ⽐較⼿法 – DTNN [Schütt+, Nature2017], GDML [Chmiela+, 2017] 30 • 実験結果 – N=1,000 • 多くのタスクでGDMLが上回った • GDMLはカーネル回帰ベースのモデルであり、サンプル数 / 分⼦のノード数の⼆乗に⽐例して計算量が増加するため N=50,000は学習できなかった – N=50,000 • 多くのタスクでSchNetがDTNNを上回っている • SchNetは(GDMLと⽐べて)スケーラビリティに優れており、 データ数の増加に従い精度も改善された Table 2: Mean absolute errors for energy and force predictions in kcal/mol and kcal/mol/Å, respec- tively. GDML and SchNet test errors for training with 1,000 and 50,000 examples of molecular dynamics simulations of small, organic molecules are shown. SchNets were trained only on energies as well as energies and forces combined. Best results in bold. N = 1,000 N = 50,000 GDML [17] SchNet DTNN [18] SchNet forces energy both energy energy both Benzene energy 0.07 1.19 0.08 0.04 0.08 0.07 forces 0.23 14.12 0.31 – 1.23 0.17 Toluene energy 0.12 2.95 0.12 0.18 0.16 0.09 forces 0.24 22.31 0.57 – 1.79 0.09 Malonaldehyde energy 0.16 2.03 0.13 0.19 0.13 0.08 forces 0.80 20.41 0.66 – 1.51 0.08 Salicylic acid energy 0.12 3.27 0.20 0.41 0.25 0.10 forces 0.28 23.21 0.85 – 3.72 0.19 Aspirin energy 0.27 4.20 0.37 – 0.25 0.12 forces 0.99 23.54 1.35 – 7.36 0.33 Ethanol energy 0.15 0.93 0.08 – 0.07 0.05 forces 0.79 6.56 0.39 – 0.76 0.05 Uracil energy 0.11 2.26 0.14 – 0.13 0.10 forces 0.24 20.08 0.56 – 3.28 0.11 Naphtalene energy 0.12 3.58 0.16 – 0.20 0.11 forces 0.23 25.36 0.58 – 2.58 0.11
  • 31. 実験: ISO17 • Molecular Dynamics (MD) シミュレーションを⾏ったデータセット – C7O2H10の異性体129種類に関する軌跡データ • MD17とは違い、別の分⼦のデータが同じタスクとして含まれる – 2種のタスクを⽤意 • known molecules / unknown conformation: – テストデータに既知の分⼦・未知の⽴体配座を利⽤ • unknown molecules / unknown conformation: – テストデータに未知の分⼦・未知の⽴体配座を利⽤ – ⽐較⼿法 • mean predictor (訓練データの分⼦毎の平均?) 31 • 結果 – known molecules / unknown conformation • energy+forcesはQM9での精度に匹敵 – unknown molecules / unknown conformation • energy+forcesはenergyのみよりも優れていた – 原⼦間⼒を学習に加えることは、単⼀の分⼦にフィット しているわけではなく、化合物空間全体で⼀般化されていた – known moleculesと⽐べると精度に隔たりがあり、 さらなる改善が必要 Table 3: Mean absolute errors on C7O2H10 isomers in kcal/mol. mean predictor SchNet energy energy+forces known molecules / energy 14.89 0.52 0.36 unknown conformation forces 19.56 4.13 1.00 unknown molecules / energy 15.54 3.11 2.40 unknown conformation forces 19.15 5.71 2.18 Table 1: Mean absolute errors for energy predictions in kcal/mol on the QM9 data set with given training set size N. Best model in bold. N SchNet DTNN [18] enn-s2s [19] enn-s2s-ens5 [19] 50,000 0.59 0.94 – – 100,000 0.34 0.84 – – 110,462 0.31 – 0.45 0.33 We include the total energy E as well as forces Fi in the training loss to train a neural network that performs well on both properties: `( ˆE, (E, F1, . . . , Fn)) = kE ˆEk2 + ⇢ n nX i=0 Fi @ ˆE @Ri ! 2 . (5) This kind of loss has been used before for fitting a restricted potential energy surfaces with MLPs [36]. In our experiments, we use ⇢ = 0 in Eq. 5 for pure energy based training and ⇢ = 100 for combined energy and force training. The value of ⇢ was optimized empirically to account for different scales of energy and forces. Due to the relation of energies and forces reflected in the model, we expect to see improved gen- eralization, however, at a computational cost. As we need to perform a full forward and backward
  • 32. まとめ • Cotinuous-filter convolutional (cfconv) layerを提案 – 分⼦中の原⼦のような、ノード間の距離が不規則なグラフに対して有効な特徴抽出層 を提案した • SchNetを提案 – cfconv layerを⽤いることで3次元空間上の任意の位置に存在する原⼦の相互作⽤を モデリングした • ISO17(ベンチマークデータセット)を提案 • QM9, MD17, ISO17で実験・評価し、本⼿法の有効性を確認 – ⾮平衡状態の分⼦に対するエネルギー予測タスクにおいて、原⼦間⼒の予測を学習対象に 加えることによって性能改善を実現した – ⾮平衡状態・未知な分⼦に対して⾼い性能で予測できるよう、ロバストな学習を実現する ことが今後の課題 32
  • 33. References • SchNet – Schütt, Kristof, et al. "SchNet: A continuous-filter convolutional neural network for modeling quantum interactions." Advances in Neural Information Processing Systems. 2017. • MPNN variants – Gilmer, Justin, et al. "Neural message passing for quantum chemistry." In Proceedings of the 34th International Conference on Machine Learning, pages 1263–1272, 2017. – Duvenaud, David K., et al. "Convolutional networks on graphs for learning molecular fingerprints." Advances in neural information processing systems. 2015. – Li, Yujia, Tarlow, Daniel, Brockschmidt, Marc, and Zemel, Richard. Gated graph sequence neural networks. ICLR, 2016. – Schütt, Kristof T., et al. "Quantum-chemical insights from deep tensor neural networks." Nature communications 8 (2017): 13890. • Others – Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. "Order matters: Sequence to sequence for sets." ICLR, 2016. – Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., & Müller, K. R. (2017). Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3(5), e1603015. 33