Algorithm of Automatic Parallelization of Generalized Matrix Multiplication
نویسندگان
چکیده
Parallelization of generalized matrix-matrix multiplication is crucial for achieving high performance required in many situations. Parallelization performed using contemporary compilers is not sufficient enough to replace expert-tuned multi-threaded implementations or to get close to their performance. All competitive solutions require previously optimized external implementations that cannot be available for a given type of data and hardware architecture. In the paper, we introduce an automatic compiler transformation that does not require an external code or automatic tuning to attain more than 85% of performance of an optimized BLAS library. Our optimization shows competitive performance across various hardware architectures and in the case of different forms of generalized matrix-matrix multiplication. We believe that availability of multi-threaded implementations of generalized matrix-matrix multiplication can save time when any optimized libraries are not available.
منابع مشابه
Experimental Evaluation of A ne Schedules for Matrix Multiplication on the MasPar Architecture
This paper reports an experimental study on the suitability of systolic algorithms scheduling methods to the automatic parallelization of algorithms on SIMD computers. We consider the matrix multiplication on the MasPar MP-1 architecture. We comparatively study diierent scheduling methods and the blocking of the best resulting algorithms.
متن کاملLoop Parallelization for a Grid Master- Worker Framework
Despite the evolution in Grid middleware, the development and execution of Grid applications is still not simple. We propose an approach to parallelizing applications straight to the Grid. Both the parallelization and application execution processes should be as simple as possible. We present a software architecture that combines loop parallelization with Higher-Order Components. We develop a H...
متن کاملA Solution for Automatic Parallelization of Sequential Assembly Code
Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler co...
متن کاملA New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملParallelization of the deMon2k code
The parallelization of the LCGTO-KS-DFT code deMon2k is presented. The parallelization of the three-center electron repulsion integrals, the numerical integration using a direct grid algorithm and the matrix multiplication and diagonalization are described. The efficiency of the parallelization is analyzed by selected benchmark calculations. It is shown that geometry optimizations of systems wi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017