A High Performance Parallel Strassen Implementation

نویسندگان

  • Brian Grayson
  • Robert A. van de Geijn
چکیده

In this paper, we give what we believe to be the rst high performance parallel implementation of Strassen's algorithm for matrix multiplication. We show how under restricted conditions, this algorithm can be implemented plug compatible with standard parallel matrix multiplication algorithms. Results obtained on a large Intel Paragon system show a 10-20% reduction in execution time compared to what we believe to be the fastest standard parallel matrix multiplication implementation available at this time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalizing of a High Performance Parallel Strassen Implementation on Distributed Memory MIMD Architectures

Strassen’s algorithm to multiply two n×n matrices reduces the asymptotic operation count from O(n) of the traditional algorithm to O(n), thus designing efficient parallelizing for this algorithm becomes essential. In this paper, we present our generalizing of a parallel Strassen implementation which obtained a very nice performance on an Intel Paragon: faster 20% for n ≈ 1000 and more than 100%...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)

Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...

متن کامل

FRPA: A Framework for Recursive Parallel Algorithms

Recursion continues to play an important role in high-performance computing. However, parallelizing recursive algorithms while achieving high performance is nontrivial and can result in complex, hardto-maintain code. In particular, assigning processors to subproblems is complicated by recent observations that communication costs often dominate computation costs. Previous work [1]–[3] demonstrat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Parallel Processing Letters

دوره 6  شماره 

صفحات  -

تاریخ انتشار 1996