nlp deep learning imagenet roberta albert bert architecture seach deeplearning proxylessnas efficientnet mnasnet nasnet neural machine translation self-attention transformer attention mechanism vietnam ai community in japan separable convolution cnn computer vision mobilenet cpvr 2018 negative sampling noise contrastive estimation hierarchical softmax word2vec word embedding machine learning adam batch gradient descent stochastic gradient descent rmsprop adagrad gradient descent flat region optimization algoritms momentum
Ver más