This paper presents a set of compiler optimizations and their application strategies for a common class of data parallel loop nests. The arrays updated in the body of the loop nests are assumed to be partitioned into blocks (rectangular, rows, or columns) where each block is assigned to a processor. These optimizations are demonstrated in the context of a FORTRAN-90 compiler with very encouragi...