ICLR 2018

Deep Learning For Physical Processes :
Incorporating Prior Scientiﬁc Knowledge
ICLR 2018読み会@PFN
15 Slides
arXiv:1711.07970

⾃⼰紹介 2
•  名前
–  吉永尊洸（@tekenuko）
•  職業
–  LINE株式会社 DataLabs Data Scientist
•  経歴
–  元素粒⼦論（博⼠）
–  データ分析系の会社(2015)→Web業界(2018)
•  業務
–  前職ではDeep Learningチョットやってた
–  最近はビジネスアナリスト的な感じ

論⽂概要 3
•  概要
–  Deep Learningの⾃然現象への応⽤
•  アトランティス海域の海⾯温度の予測
•  論⽂のアイデア
–  物理の知⾒を活かした機械学習モデルの構築
•  直接対象を予測するNNを作るのではなく、現象を記述する
⽅程式の解のパラメータをNNでモデル化する
•  実験結果
–  ⽐較⼿法のなかでは最も良い精度
•  所感
–  機械学習の応⽤範囲を広げる⼀助となる論⽂
•  ⾃然科学、機械学習がお互いに発展しあっていくことを期待

Introduction 4
•  物理モデルが主流だった⾃然現象のモデル化・
予測を機械学習で⾏う動きがある
–  素粒⼦実験の解析、相転移など
•  流体、気象などの複雑な現象を機械学習のみで
定量化するのは難しい
–  複雑な⾃然現象はinputとoutputの関係をデータから
学習するにはまだまだ複雑すぎる
•  本論⽂では、物理学の知⾒を活かしつつ、⾃然
現象に機械学習を応⽤する⽅法の⼀つを提供

問題設定 5
•  アトランティス海域の海⾯温度を予測する(*)
–  trainning : 2006~2015年
–  test : 2016~2017年, 17~21のregion
cean related engines. It is a primitive equation model adapted to the regional and global ocean
ulation problems. Historical data is accumulated in the model to generate a synthesized estimate
he states of the system using data analysis, a specific data assimilation scheme, which means that
data does follow the true temperatures. The resulting dataset is constituted of daily temperature
uisitions of 481 by 781 pixels, from 2006-12-28 to 2017-04-05 (3734 acquisitions).
extract 64 by 64 pixel sized sub-regions as indicated in figure 4.1. We use data from years 2006
015 for training and validation (94743 training examples), and years 2016 to 2017 for testing.
withhold 20% of the training data for validation, selected uniformly at random at the beginning
ach experiment. For the tests we used sub-regions enumerated 17 to 20 in figure 4.1, where the
ractions between hot and cold waters make the dynamics interesting to study. All the regions
mbered in figure 4.1, from 2006 to 2015 where used for training 3
. Each sequence of images
d for training or for evaluation corresponds to a specific numbered sub-region. We make the
plifying hypothesis that the data in a single sub-region contains enough information to forecast
future of the sub-region. As the forecast is for a small temporal horizon we can assume that the
uence from outside the region is small enough.
Figure 4: Sub regions extracted for the dataset. Test regions are regions 17 to 20.
normalize the daily SST acquisitions of each sub region using the mean and the standard devia-
(*) 実データは気象衛星画像だが、前処理が⼤変なため、モデリングフレームワーク+データ同化
で実際の現象を再現したもので代⽤（NEMO）
全体は481×781ピクセル
1~29はそれぞれ64×64

本論⽂のアイデア 6
•  機械学習のモデリングで物理学の背景を考慮
–  海⾯の表⾯温度変化は流体輸送問題と関連
•  移流拡散⽅程式 : Iは海⾯温度
•  第1項 : ⾮定常項
–  時間的に変化する効果
•  第2項 : 移流項
–  流れに沿って移動する効果
–  w = ⊿x/⊿tはMotion Field（移動の程度を決める量）
•  第3項 : 拡散項
–  空間的に広がる効果
–  Dは拡散係数（広がる程度を決める量）
@I
@t
+ (w.r)I = 0
enotes the gradient operator, and w the motion vector x
t . This equation des
volution of quantity I for displacement w. Note that this equation is also the
ional methods for Optical Flow. To retrieve the motion, numerical schemes ar
ulting system of equations, along with a an additional constraint on w is solv
n can then be used to forecast the future value of I.
@I
@t
+ (w.r)I = Dr2
I
the Laplacian operator and D the diffusion coefﬁcient. Note that when D
advection equation 2.
on describes a large family of physical processes (e.g. ﬂuid dynamics, heat co
mics, etc). Let us now state a result, characterizing the general solutions of equ
2

本論⽂のアイデア 7
•  機械学習のモデリングで物理学の背景を考慮
–  いったん移流拡散⽅程式で現象が記述されると仮定
すると、以下のような定理が成⽴する
–  要は初期条件とw, Dがわかってしまえば海⾯温度は
決定論的に予測できる
–  wを観測データを⽤いて機械学習のモデル化をするの
が本論⽂の⽅法
Published as a conference paper at ICLR 2018
Theorem 1. 1
For any initial condition I0 2 L1
(R2
) with I0(±1) = 0, there exists a unique global
solution I(x, t) to the advection-diffusion equation 3:
I(x, t) =
Z
R2
k( x w, y) I0(y) dy (4)
where k(u, v) = 1
4⇡Dt e
1
4Dt ku vk2
is a radial basis function kernel, or alternatively, a 2 dimen-
sional Gaussian probability density with mean u and variance 2Dt.
Equation 4 provides a principled way to calculate I(x, t) for any time t using the initial condition
I0, provided the motion w and the diffusion coefficient D are known. It states that quantity I(x, t)
can be computed from the initial condition I0 via a convolution with a Gaussian probability density
function. In other words, if I was used as a model for the evolution of the SST and the surface’s
underlying advecting mechanisms were known, future surface temperatures could be predicted from
previous ones. Unfortunately neither the initial conditions, the motion vector nor the diffusion co-
efficient are known. They have to be estimated from the data. Inspired from the general form of
solution 4, we propose a ML method, expressed as a Deep Learning architecture for predicting SST.
This model will learn to predict a motion field analog to the w in equation 4, which will be used to
predict future images.

Models 8
•  CDNN
–  CNNからMotion Fieldを評価
•  Warped sheme
–  移流拡散⽅程式の解から表⾯温度を予測
Equation 4 provides a principled way to calculate I(x, t) for any time t using the initial condition
I0, provided the motion w and the diffusion coefficient D are known. It states that quantity I(x, t)
can be computed from the initial condition I0 via a convolution with a Gaussian probability density
function. In other words, if I was used as a model for the evolution of the SST and the surface’s
underlying advecting mechanisms were known, future surface temperatures could be predicted from
previous ones. Unfortunately neither the initial conditions, the motion vector nor the diffusion co-
efficient are known. They have to be estimated from the data. Inspired from the general form of
solution 4, we propose a ML method, expressed as a Deep Learning architecture for predicting SST.
This model will learn to predict a motion field analog to the w in equation 4, which will be used to
predict future images.
3 MODEL
SupervisionModel
Warping
Scheme
It k 1:t
ˆwt
Ît+1 It+1
Motion Field
CDNN

CDNN 9
•  Input : 過去kステップの画像
•  Output : 時点tのMotion ﬁeld
•  Network : CNN
–  used batch normalization
–  activation : Leaky ReLU
3.1 MOTION ESTIMATION
It k 1:t
Convolution Deconvolution
64 64 k
64 64 2
64 32 32
128 16 16
256 8 8
512 4 4
386 8 8
194 16 16
98 32 32
Skip Connections
ˆwt

Warped Scheme 10
•  移流拡散⽅程式の解に代⼊
–  積分は離散化、初期条件はItとする
–  ガウシアンの重みで⾜し合わせ
motion vector for each pixel. As shown in figure 2, this network makes use of skip connections He
et al. (2015), allowing fine grained information from the first layers to flow through in a more direct
manner. We use a Batch Normalization layer between each convolution, and Leaky ReLU (with
parameter value set to 0.1) non-linearities between convolutions and transposed-convolutions. We
used k = 4 concatenated images It k 1:t as input for training. We have selected this architecture
experimentally, testing different state-of-the-art convolution-deconvolution network architectures.
Let ˆw 2 R2⇥W ⇥H
be the output of the network, where W and H are respectively the width and
height of the images, and ’2’ corresponds to the two components of the flow at each point of the
motion field.
Generally, and this is the case for our problem, we do not have a direct supervision on the motion
vector field, since the target motion is usually not available. Using the warping scheme introduced
below, we will nonetheless be able to (weakly) supervise w, based on the discrepancy of the warped
version of the It image and the target image It+1.
3.2 WARPING SCHEME
It It+1
x
x w
w
blished as a conference paper at ICLR 2018
cretizing the solution of the advection-diffusion equation in section 2 by replacing the integ
h a sum, and setting image It as the initial condition, we obtain a method to calculate the futu
ge, based on the motion field estimate ˆw. The latter is used as a warping scheme:
Ît+1(x) =
X
y2⌦
k( x ˆw(x), y) It(y)
ere k(x ˆw, y) = 1
4⇡D t e
1
4D t kx ˆw yk2
is a radial basis function kernel, as in equation
ameterized by the diffusion coefficient D and the time step value t between t and t + 1 and
he estimated value of the vector flow w. To calculate the temperature for time t + 1 at position
compute the scalar product between k(x ˆw, .), a Gaussian centered in x ˆw, and the previo
ge It. Simply put, it is a weighted average of the temperatures It, where the weight values a
er when the pixel’s positions that are closer to x ˆw. Informally, x ˆw corresponds to t
el’s previous position at time t. See figure 3.
seen by the relation with the solution of the advection-diffusion equation, the proposed wa
Discretizing the solution of the advection-diffusion equation in sec
with a sum, and setting image It as the initial condition, we obtain
image, based on the motion field estimate ˆw. The latter is used as a
Ît+1(x) =
X
y2⌦
k( x ˆw(x), y) It(y
where k(x ˆw, y) = 1
4⇡D t e
1
4D t kx ˆw yk2
is a radial basis fu
parameterized by the diffusion coefficient D and the time step value
is the estimated value of the vector flow w. To calculate the tempera
we compute the scalar product between k(x ˆw, .), a Gaussian cent
image It. Simply put, it is a weighted average of the temperatures
larger when the pixel’s positions that are closer to x ˆw. Inform
pixel’s previous position at time t. See figure 3.
As seen by the relation with the solution of the advection-diffusion
ing mechanism is then clearly adapted to the modeling of phenome

Loss Function 11
•  Charbonnier penalty function + 正則化
–  Loss Function
–  Charbonnier penalty function
•  ε→0, α→1/2でL2ノルムに帰着
•  外れ値の影響を緩和するために利⽤
–  Optimizer : Adam
–  Hyperparameter : validationで決める
term directly on the motion field. In our experiments, we tested the influence of different terms:
divergence r. wt(x)2
, magnitude kwt(x)k
2
and smoothness krwt(x)k
2
.
Lt =
X
x2⌦
⇢(Ît+1(x) It+1(x)) + div(r. wt(x))2
+ magn kwt(x)k
2
+ grad krwt(x)k
2
(6)
4 EXPERIMENTS
4.1 DATASET DESCRIPTION
Since 1982, high resolution SST data has been made available by the NOAA6 weather satellite,
Bernstein (1982). Dealing directly with these data requires a lot of preprocessing (e.g. some re-
gions are not available due to clouds, hindering temperature acquisition). In order to avoid such
complications which are beyond the scope of this work, we used synthetic but realistic SST data of
the Atlantic ocean generated by a sophisticated simulation engine: NEMO (Nucleus for European
Modeling of the Ocean) engine 2
, Madec (2008). NEMO is a state-of-the-art modelling framework
of ocean related engines. It is a primitive equation model adapted to the regional and global ocean
circulation problems. Historical data is accumulated in the model to generate a synthesized estimate
of the states of the system using data analysis, a specific data assimilation scheme, which means that
the data does follow the true temperatures. The resulting dataset is constituted of daily temperature
acquisitions of 481 by 781 pixels, from 2006-12-28 to 2017-04-05 (3734 acquisitions).
training set, i.e. the SST acquisition of sub-region 2 on date September 8th 2017 is st
ing the data of all the September 8th available in the dataset. This removes the seaso
from SST data.
4.2 BASELINE COMPARISON
We compare our model with several baselines. Each model is evaluated with a mea
metric, forecasting images on a horizon of 6 (we forecast from It+1 to It+6 and th
MSE). The hyperparameters are tuned using the validation set. Neural network bas
run on a Titan Xp GPU, and runtime is given for comparison purpose.
Concerning the constraints on the vector field w (equation 6. the regularization coeffi
via validation are div = 1, magn = 0.03 and grad = 0.4. The coefficient diffusio
0.45 by cross validation. We also compare the results with the model without any reg
et al. (2015), originally designed to be incorporated as a layer in a convolutional neural
architecture in order to gain invariance under geometric transformations. Using the nota
Jaderberg et al. (2015), when the inverse geometric transformation T✓ of the grid generato
set to T✓(x) = x ˆw(x), and the kernels k( . ; x) and k( . ; y) in the sampling step are rad
function kernels, we recover our warping scheme. The latter can be seen as a specific ca
STN, without the localization step. This result theoretically grounds the use of the STN for
Flow in many recent articles Zhu et al. (2017), Yu et al. (2016), Patraucean et al. (2015), F
(2016): in equation 3, when D ! 0, we recover the brightness constancy constraint equati
in the latter.
For training, supervision is provided at the output of the warping module. It consists in min
the discrepancy between the warped image Ît+1 and the target image It+1. The loss is meas
a differentiable function and the gradient is back propagated through the warping function
to adjust the parameters of the convolutional-deconvolutional module generating the vec
This is detailed in the next section.
3.3 LOSS FUNCTION
At each iteration, the model aims at forecasting the next observation, given the previous o
evaluate the discrepancy between the warped image Ît+1 and the target image It+1 using t
bonnier penalty function ⇢(x) = (x + ✏)
1
↵ , where ✏ and ↵ are parameters to be set. Note t
✏ = 0 and ↵ = 1
2 , we recover the `2 loss. The Charbonnier penalty function is known t
the influence of outliers compared to an l2 norm. We have also tested the Laplacian pyra
Ling & Okada (2006), where we enforce convolutions of all deconvolutional layers to be
down-sampled versions of the target image in the Charbonnier penalty sense, but we have o
an overall decrease in generalization performance.
The proposed NN model has been designed according to the intuition gained from gener
ground knowledge of a physical phenomenon, here advection-diffusion equations. Additio
knowledge – expressed as partial differential equations, or through constraints – can be e

Results 12
•  Code
–  https://github.com/emited/flow
•  最上段 (Baseline) : シミュレーション＋データ同化
•  海⾯温度を直接予測するモデル（ACNN, ConvLSTM）
はBaselineよりも低パフォーマンス
•  ACNN w/ GANはACNNよりもパフォーマンスが向上
したが、Baselineには届かず
•  提案⼿法（⾚枠）が最も⾼精度
et al. (2015), which is made available at https://github.com/dyelax/Adversarial_
Video_Generation. For Béréziat & Herlin (2015), the code has been provided by the authors
of the paper. We have implemented the ACNN and ConvLSTM models ourselves. The code for our
models, along with these baselines will be made available.
4.3 QUANTITATIVE RESULTS
Model Average Score (MSE) Average Time
Numerical model Béréziat & Herlin (2015) 1.99 4.8 s
ConvLSTM Shi et al. (2015) 5.76 0.018 s
ACNN 15.84 0.54 s
GAN Video Generation (Mathieu et al. (2015)) 4.73 0.096 s
Proposed model with regularization 1.42 0.040 s
Proposed model without regularization 2.01 0.040 s
Table 1: Average score and average time on test data. Average score is calculated using the mean
square error metric (MSE), time is in seconds. The regularization coefficients for our model have
been set using a validation set with div = 1, magn = 0.03 and grad = 0.4.
Quantitatively, our model performs well. The MSE score is better than any of the baselines.
The closest NN baseline is Mathieu et al. (2015) which regularizes a regression convolution-
deconvolution model with a GAN. The performance is however clearly below the proposed model
and it does not allow to easily incorporate prior constraints inspired from the physics of the phe-
nomenon. ACNN is a direct predictor of the image sequence, implemented via a CDNN module
identical to the one used in our model. Its performance is poor. Clearly, a straightforward use of
prediction models is not adapted to the complexity of the phenomenon. ConvLSTM performs better:
as opposed to the ACNN, it seems to be able to capture a dynamic, although not very accurately.
Overall, direct prediction models are not able to capture the complex underlying dynamics and they
produce blurry sequences of images. The GAN explicitly forces the network output to eliminate the
blurring effect and then makes it able to capture short term dynamics. The state of the art numerical

Results 13Published as a conference paper at ICLR 2018
It It+1 It+3 It+6
観測データ
提案⼿法
シミュレーション
ACNN
ConvLSTM
最も観測と整合的

本論⽂のポイント（私⾒） 14
•  海⾯温度の予測の問題を、流体の輸送現象でう
まく記述できるだろうと仮定したこと
–  ⼀般解は、初期値をデータから決まる少数の現象論
的パラメータを持った重み関数で⾜し合わせると
いった、理論計算と観測から決める部分がうまく分
離できている構造
–  逆に⾔うと、うまく構造化できないような現象はこ
の⽅法でもうまく予測できないと思われる
•  機械学習だけだと、上記のような構造をデータ
だけから抽出するのはまだ難しい（のだろう）

Summary 15
•  概要
–  Deep Learningの⾃然現象への応⽤
•  アトランティス海域の海⾯温度の予測
•  論⽂のアイデア
–  物理の知⾒を活かした機械学習モデルの構築
•  直接対象を予測するNNを作るのではなく、現象を記述する
⽅程式の解のパラメータをNNでモデル化する
•  実験結果
–  ⽐較⼿法のなかでは最も良い精度
•  所感
–  機械学習の応⽤範囲を広げる⼀助となる論⽂
•  ⾃然科学、機械学習がお互いに発展しあっていくことを期待

ICLR 2018

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a ICLR 2018

Similar a ICLR 2018 (20)

Más de Takahiro Yoshinaga

Más de Takahiro Yoshinaga (9)

Último

Último (20)

ICLR 2018