Algorithm of Automatic Parallelization of Generalized Matrix Multiplication

نویسندگان

Elena N. Akimova

Roman A. Gareev

چکیده

Parallelization of generalized matrix-matrix multiplication is crucial for achieving high performance required in many situations. Parallelization performed using contemporary compilers is not sufficient enough to replace expert-tuned multi-threaded implementations or to get close to their performance. All competitive solutions require previously optimized external implementations that cannot be available for a given type of data and hardware architecture. In the paper, we introduce an automatic compiler transformation that does not require an external code or automatic tuning to attain more than 85% of performance of an optimized BLAS library. Our optimization shows competitive performance across various hardware architectures and in the case of different forms of generalized matrix-matrix multiplication. We believe that availability of multi-threaded implementations of generalized matrix-matrix multiplication can save time when any optimized libraries are not available.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experimental Evaluation of A ne Schedules for Matrix Multiplication on the MasPar Architecture

This paper reports an experimental study on the suitability of systolic algorithms scheduling methods to the automatic parallelization of algorithms on SIMD computers. We consider the matrix multiplication on the MasPar MP-1 architecture. We comparatively study diierent scheduling methods and the blocking of the best resulting algorithms.

متن کامل

Loop Parallelization for a Grid Master- Worker Framework

Despite the evolution in Grid middleware, the development and execution of Grid applications is still not simple. We propose an approach to parallelizing applications straight to the Grid. Both the parallelization and application execution processes should be as simple as possible. We present a software architecture that combines loop parallelization with Higher-Order Components. We develop a H...

متن کامل

A Solution for Automatic Parallelization of Sequential Assembly Code

Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler co...

متن کامل

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Parallelization of the deMon2k code

The parallelization of the LCGTO-KS-DFT code deMon2k is presented. The parallelization of the three-center electron repulsion integrals, the numerical integration using a direct grid algorithm and the matrix multiplication and diagonalization are described. The efficiency of the parallelization is analyzed by selected benchmark calculations. It is shown that geometry optimizations of systems wi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Algorithm of Automatic Parallelization of Generalized Matrix Multiplication

نویسندگان

چکیده

منابع مشابه

Experimental Evaluation of A ne Schedules for Matrix Multiplication on the MasPar Architecture

Loop Parallelization for a Grid Master- Worker Framework

A Solution for Automatic Parallelization of Sequential Assembly Code

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

Parallelization of the deMon2k code

عنوان ژورنال:

اشتراک گذاری