Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod to maximize effective throughput/Watt—i.e., throughput/Watt adjusted when accounting for array utilization—poses a unique set challenges. In this work, we study three key pillars designs, namely granularity, interconnect, and tiling. We identify optima...