38. 参考文献
• [Allen-Zhu et al., 19] Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via
over-parameterization. Proceedings of the 36th International Conference on Machine Learning, volume 97 of
Proceedings of Machine Learning Research, pages 242–252. PMLR, 2019
• [Alsentzer et al., 19] Alsentzer, E., Murphy, J., Boag, W., Weng, W.-H., Jindi, D., Naumann, T., and McDermott, M.
(2019). Publicly available clinical BERT embeddings. In Proceedings of the 2nd Clinical Natural Language
Processing Workshop, pages 72–78, Minneapolis, Minnesota, USA. Association for Computational Linguistics.
• [Arora et al., 18] Arora, S., Ge, R., Neyshabur, B., and Zhang, Y. (2018). Stronger generalization bounds for deep
nets via a compression approach. Proceedings of the 35th International Conference on Machine Learning, volume
80 of Proceedings of Machine Learning Research, pages 254–263, PMLR.
• [Banerjee et al., 20] Banerjee, A., Chen, T., and Zhou, Y. (2020). De- randomized pac-bayes margin bounds:
Applications to non-convex and non- smooth predictors. arXiv preprint arXiv:2002.09956.
• [Belkin et al., 18] Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2018). Reconciling modern machine learning and the
bias-variance trade-off. arXiv preprint arXiv:1812.11118.
38
39. 参考文献
• [Cybenko, 89] George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control,
signals and systems, 2(4):303–314, 1989.
• [Devlin et al., 18] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805.
• [Du et al., 19] Simon S. Du, Xiyu Zhai, Barnabas Poczos, and Aarti Singh. Gradient descent provably optimizes
over-parameterized neural networks. In International Conference on Learning Representations, 2019.
• [Eldan and Shamir, 16] Ronen Eldan and Ohad Shamir. The power of depth for feedforward neural networks. In 29th
Annual Conference on Learning Theory, volume 49 of Proceedings of Machine Learning Research, pages 907–940.
PMLR, 2016.
• [Gilmer et al., 17] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural
message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning,
volume 70, pages 1263–1272. PMLR, 2017.
39
40. 参考文献
• [Hardt et al., 16] Moritz Hardt, Ben Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic
gradient descent. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of
Proceedings of Machine Learning Research, pages 1225– 1234. PMLR, 2016.
• [Jacot et al., 18] Arthur Jacot, Franck Gabriel, and Clement Hongler. Neural tangent kernel: Convergence and
generalization in neural networks. Advances in Neural Information Process- ing Systems 31, pages 8571–8580.
Curran Associates, Inc., 2018.
• [Jin et al., 18] Wengong Jin, Regina Barzilay, and Tommi Jaakkola. Junction tree variational autoencoder for
molecular graph generation. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International
Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2323–2332.
PMLR, 2018.
• [Jin et al., 19] Wengong Jin, Kevin Yang, Regina Barzilay, and Tommi Jaakkola. Learning multi- modal graph-to-
graph translation for molecule optimization. In International Conference on Learning Representations, 2019.
• [Kipf and Welling, 17] Thomas N. Kipf and Max Welling. Semi-supervised classification with graph convolutional
networks. In International Conference on Learning Representations, 2017.
40
41. 参考文献
• [Lan et al., 20] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2020). Albert: A lite bert for
self-supervised learning of language representations. In International Conference on Learning Representations.
• [LeCun et al., 98] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al. (1998). Gradient-based learning applied to
document recognition. Proceedings of the IEEE, 86(11):2278–2324.
• [Li et al., 18] Li, Q., Han, Z., and Wu, X.-M. (2018). Deeper insights into graph convolutional networks for semi-
supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence.
• [Li et al., 18] Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. Visualizing the loss landscape
of neural nets. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa- Bianchi, and R. Garnett, editors,
Advances in Neural Information Processing Systems 31, pages 6389–6399. Curran Associates, Inc., 2018
• [Liu et al., 18] Qi Liu, Miltiadis Allamanis, Marc Brockschmidt, and Alexander Gaunt. Constrained graph variational
autoencoders for molecule design. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R.
Garnett, editors, Advances in Neural Information Processing Systems 31, pages 7795–7804. Curran Associates,
Inc., 2018.
41
42. 参考文献
• [Qi et al., 19] Mengshi Qi, Weijian Li, Zhengyuan Yang, Yunhong Wang, and Jiebo Luo. Attentive rela- tional networks
for mapping images to scene graphs. In Proceedings of the IEEE Confer- ence on Computer Vision and Pattern
Recognition, pages 3957–3966, 2019.
• [Nagarajan and Kolter, 19] Nagarajan, V. and Kolter, J. Z. (2019). Uniform convergence may be unable to explain
generalization in deep learning. In Advances in Neural Information Processing Systems 32, pages 11615–11626.
Curran Associates, Inc.
• [Nakkiran et al., 20] Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., and Sutskever, I. (2020). Deep double
descent: Where bigger models and more data hurt. In International Conference on Learning Representations.
• [Oono and Suzuki, 20] Oono, K. and Suzuki, T. (2020). Graph neural net- works exponentially lose expressive power
for node classification. In Interna- tional Conference on Learning Representations.
• [Shchur et al., 18] Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Gu ̈nnemann. Pitfalls
of graph neural network evaluation. arXiv preprint arXiv:1811.05868, 2018.
42
43. 参考文献
• [Sonoda and Murata, 17] Sho Sonoda and Noboru Murata. Neural network with unbounded activation functions is
universal approximator. Applied and Computational Harmonic Analysis, 43(2):233–268, 2017.
• [Suzuki et al., 2020] Suzuki, T., Abe, H., and Nishimura, T. (2020). Compres- sion based bound for non-compressed
network: unified generalization error analysis of large compressible deep neural network. In International
Conference on Learning Representations.
• [Telgarsky, 16] Matus Telgarsky. benefits of depth in neural networks. In 29th Annual Conference on Learning Theory,
volume 49, pages 1517–1539. PMLR, 2016.
• [Vaswani et al., 2017] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. u., and
Polosukhin, I. (2017). Atten- tion is all you need. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R.,
Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 5998–
6008. Curran Associates, Inc.
• [Wang et al., 19] Lei Wang, Yuchun Huang, Yaolin Hou, Shenman Zhang, and Jie Shan. Graph attention convolution
for point cloud semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), June 2019.
43
44. 参考文献
• [Wesley et al., 2020] Wesley, J. M., Gregory, B., and Andrew, G. W. (2020). Rethinking parameter counting in deep
models: Effective dimensionality re- visited. arXiv preprint arXiv:2003.02139.
• [Wu et al., 18] Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S
Pappu, Karl Leswing, and Vijay Pande. Moleculenet: a benchmark for molec- ular machine learning. Chemical
science, 9(2):513–530, 2018.
• [Xu et al., 17] Danfei Xu, Yuke Zhu, Christopher B. Choy, and Li Fei-Fei. Scene graph generation by iterative
message passing. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
• [Yang et al., 18] Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. Graph R-CNN for scene graph
generation. In The European Conference on Computer Vision (ECCV), September 2018.
• [Yang et al., 19] Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, An- gel Guzman-
Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, et al. Are learned molecular representations ready for prime
time? arXiv preprint arXiv:1904.01561, 2019.
44
45. 参考文献
• [Zhang and Meng, 2019] Zhang, J. and Meng, L. (2019). Gresnet: Graph resid- uals for reviving deep graph neural
nets from suspended animation. arXiv preprint arXiv:1909.05729.
• [Zhao and Akoglu, 2020] Zhao, L. and Akoglu, L. (2020). Pairnorm: Tackling oversmoothing in {gnn}s. In
International Conference on Learning Represen- tations.
• [Zitnik et al., 18] Marinka Zitnik, Monica Agrawal, and Jure Leskovec. Modeling polypharmacy side effects with
graph convolutional networks. Bioinformatics, 34(13):i457–i466, 2018.
45