Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results spanning multiple domains. This single model is trained concurrently on ImageNet, multiple translation tasks, image captioning, a speech recognition corpus, and an English parsing task. We achieved state-of-the-art performance while training much quicker and generating long coherent pieces, even on the scale of full Wikipedia articles. Our new architectures improve the ability to generate both text and images