. Because these images contain many distracting and uninformative regions, a model which is better able to ignore these patterns should generalize better to longer sequences. Additionally many details in these images are difficult to understand without having good top-down priors.