43. Operator Fusion: NNVM の実装
• オペレータは主に4つのグループに分かれている
• グループ間でフューズ可能か決められている
NN op
convolution
conv transpose
pooling
fully connected
Injective
reshape
layout transform
transpose
concatenate
Elementwise
math op
relu
elemwise sum
sigmoid
Broadcast
broadcast_add
broadcast_mul
…
expand_dims
NN op には Elementwise か
Broadcast op をフューズできる
NN op 以外は同じグループの
op 同士で フューズできる, など
44. Operator Fusion: 例
• Convolution + Bias Add + Batch norm + ReLU
data = sym.Variable(name="data")
data = sym.conv2d(data, kernel_size=(3,3), channels=8, use_bias=True)
data = sym.batch_norm(data)
data = sym.relu(data)
Graph(%data, %conv2d0_weight, %conv2d0_bias,%batch_norm0_gamma_mul_div_expand, %batch_norm0_add_beta_expand) {
%5 = tvm_op(%data, %conv2d0_weight, %conv2d0_bias, %batch_norm0_gamma_mul_div_expand, %batch_norm0_add_beta_expand,
flatten_data='0', func_name='fuse_conv2d_broadcast_mul_broadcast_add_relu', num_inputs='5', num_outputs='1’)
ret %5
}
68. Links
• TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, OSDI 2018
• https://arxiv.org/abs/1802.04799
• Learning to Optimize Tensor Programs, NIPS 2019
• AutoTVM 論文
• https://arxiv.org/abs/1805.08166
• Relay: A New IR for Machine Learning Frameworks
• https://arxiv.org/abs/1810.00952
• VTA: An Open Hardware-Software Stack for Deep Learning
• https://arxiv.org/abs/1807.04188
• Optimizing CNN Model Inference on CPUs
• Amazon のエンジニアによる、Xeon 向けの TVM 最適化に関する論文
• https://arxiv.org/abs/1809.02697
• Discussion forum
• https://discuss.tvm.ai/
• 公式ドキュメント. チュートリアルなども充実.
• https://docs.tvm.ai/
• Automatic Kernel Optimization for Deep Learning on All Hardware Platforms
• https://tvm.ai/2018/10/03/auto-opt-all.html