This document discusses optimizations for deep neural network (DNN) hardware architectures. It describes common DNN layers like convolutional and recurrent layers. It evaluates DNN architectures based on metrics like accuracy, throughput, latency, energy efficiency and hardware cost. It introduces the roofline model for analyzing computational intensity and performance bottlenecks. Different techniques are proposed to optimize the multiply-accumulate (MAC) operations at the core of DNNs, like leveraging parallel processing elements, quantization, pruning and rearranging computations. The MIT Eyeriss architecture is highlighted as an example of an optimized hardware design that maps DNN computations efficiently onto a configurable array of processing elements.