Using Non-canonical Array Layouts in Dense Matrix Operations

نویسندگان

  • José R. Herrero
  • Juan J. Navarro
چکیده

We present two implementations of dense matrix multiplication based on two different non-canonical array layouts: one based on a hypermatrix data structure (HM) where data submatrices are stored using a recursive layout; the other based on a simple block data layout with square blocks (SB) where blocks are arranged in column-major order. We show that the iterative code using SB outperforms a recursive code using HM and obtains competitive results on a variety of platforms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Algorithmic Redistribution Methods for Block-Cyclic Decompositions

This research aims at creating and providing a framework to describe algorithmic redistribution methods for various block cyclic decompositions. To do so properties of this data distribution scheme are formally exhibited. The examination of a number of basic dense linear algebra operations illustrates the application of those properties. This study analyzes the extent to which the general two-d...

متن کامل

Is Morton layout competitive for large two-dimensional arrays yet?

Two-dimensional arrays are generally arranged in memory in row-major order or column-major order. Traversing a row-major array in column-major order, or vice-versa, leads to poor spatial locality. With large arrays the performance loss can be a factor of 10 or more. This paper explores the Morton storage layout, which has substantial spatial locality whether traversed in row-major or column-maj...

متن کامل

Reducing Overhead in Sparse Hypermatrix Cholesky Factorization

The sparse hypermatrix storage scheme produces a recursive 2D partitioning of a sparse matrix. Data subblocks are stored as dense matrices. Since we are dealing with sparse matrices some zeros can be stored in those dense blocks. The overhead introduced by the operations on zeros can become really large and considerably degrade performance. In this paper, we present several techniques for reduc...

متن کامل

Recursive Array Layouts and Fast Matrix Multiplication

The performance of both serial and parallel implementations of matrix multiplication is highly sensitive to memory system behavior. False sharing and cache con icts cause traditional columnmajor or row-major array layouts to incur high variability in memory system performance as matrix size varies. This paper investigates the use of recursive array layouts to improve performance and reduce vari...

متن کامل

Phased array ultrasonic imaging using an improved beamforming based total focusing method for non destructive test

One of the novel ultrasonic phased array based scanning methods for ultrasonic imaging in non-destructive test is total focusing method (TFM). This method employs maximum available information of the phased array elements and leads to an improved defect detection accuracy compared to conventional scanning methods. Despite its high detection accuracy, TFM behaves weak in distinguishing the real ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006