Comparison of Scalable Parallel Matrix Multiplication Libraries
نویسندگان
چکیده
This paper compares two general library routines for performing parallel distributed matrix multiplication. The PUMMA algorithm utilizes block scattered data layout, whereas BiMMeR utilizes virtual 2-D torus wrap. The algorithmic diierences resulting from these diierent layouts are discussed as well as the general issues associated with diierent data layouts for library routines. Results on the Intel Delta for the two matrix multiplication algorithms are presented.
منابع مشابه
A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملParallel Matrix Multiplication: A Systematic Journey
We expose a systematic approach for developing distributed memory parallel matrix matrix multiplication algorithms. The journey starts with a description of how matrices are distributed to meshes of nodes (e.g., MPI processes), relates these distributions to scalable parallel implementation of matrix-vector multiplication and rank-1 update, continues on to reveal a family of matrix-matrix multi...
متن کاملComparative Study of Parallel Programming Models to Compute Complex Algorithm
The main goal of this research is to use OpenMP, Posix Threads and Microsoft Parallel Patterns libraries to design an algorithm to compute Matrix Multiplication effectively. By using the libraries of OpenMP, Posix Threads and Microsoft Parallel Patterns Libraries, one can optimize the speedup of the algorithm. First step is to write simple program which calculates a predetermined Matrix and giv...
متن کاملMatrix Multiplication Specialization in STAPL
The Standard Template Adaptive Parallel Library (STAPL) is a superset of C++’s Standard Template Library (STL) which allows highproductivity parallel programming in both distributed and shared memory environments. This framework provides parallel equivalents of STL containers and algorithms enabling ease of development for parallel systems. In this paper, we will discuss our methodology for imp...
متن کاملToward Scalable Matrix Multiply on Multithreaded Architectures
We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for shared memory architectures with many simultaneous threads of execution, including SMP architectures and future multicore processors. The always-important matrix-matrix multiplication is used to demonstrate that a simple...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1993