Parallelizing a level 3 BLAS library for LAN-connected workstations

نویسندگان

Kuo-Chan Huang

Pei-Chi Wu

Feng-Jian Wang

چکیده

shared by more than one user at a time and the user number for a workstation might change as time goes on. Therefore, each workstation would provide unequal and time-varying computing power to a parallel application. Computing models are different in LAN-connected workstations and conventional parallel computers, where a user may allocate a group of processors for dedicated use in a time period. Dynamic load balancing mechanisms are necessary for parallel applications to run efficiently on LAN-connected workstations. However, most existing parallel implementations of BLAS were designed based on conventional parallel computers, without concern for the need for dynamic load balancing capabilities on LAN-connected workstations. This paper explores the parallelization of Level 3 BLAS for LAN-connected workstations taking dynamic load balancing into consideration. Dynamic load balancing has received a great deal of attention in the literature [5, 6, 18, 19]. The basic idea of dynamic load balancing is to balance the workloads on all processors to increase the system throughput or reduce the wall clock execution time of an application. One approach is to allow the load to be migrated from heavy nodes to light ones. Our approach divides a task into subtasks whose number is larger than the processors’ and then assigns each processor one subtask at a time. Only when a processor completes its subtask is it assigned a new one. The approach achieves dynamic load balancing through proper data partition and task assignment. We have conducted several experiments to investigate various data partition methods. According to the results of experiments which will be described in detail in later sections, we propose the dynamic column-blocking method as the best data partition method to run parallel Level 3 BLAS efficiently on LAN-connected workstations. We have implemented a parallel LU factorization routine using our parallel Level 3 BLAS to show the effectiveness of our parallelizing method and discuss the issue of parallel library interface. In the next section we give high level descriptions of parallel implementations of Level 3 BLAS. Section 3 discusses the importance, advantages, and issues of LANconnected workstations. Section 4 presents various data distribution methods, our experiments and comparative analysis. Section 5 discusses the implementation of LU JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 38, 28–36 (1996) ARTICLE NO. 0126

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Porting and Measuring the Linpack Benchmark on Gamma

| GAMMA (Genoa Active Message Machine) is a high performance communication layer implemented at kernel level as an extension of the Linux operating system. It is based on Active Ports, a communication mechanism derived from Active Messages. On low-cost clusters of Personal Computers (PCs) connected by Fast Ethernet, GAMMA achieves much better communication performance compared to MPI and PVM. W...

متن کامل

ScaLAPACK Tutorial

ScaLAPACK is a library of high performance linear algebra routines for distributed memory MIMD computers. It is a continuation of the LAPACK project, which designed and produced analogous software for workstations, vector supercomputers, and shared memory parallel computers. The goals of the project are e ciency (to run as fast as possible), scalability (as the problem size and number of proces...

متن کامل

Illinois-Intel Multithreading Library: Multithreading Support for Intel Architecture Based Multiprocessor Systems

Powerful desktop multiprocessor systems based on the Intel Architecture (iA) offer a formidable alternative to traditional scientific/engineering workstations for commercial application developers at an attractive costperformance ratio. However, the lack of adequate compiler and runtime library support for multithreading and parallel processing on Windows NT* makes it difficult or impossible to...

متن کامل

BLIS: A Modern Alternative to the BLAS

We propose the portable BLAS-like Interface Software (BLIS) framework which addresses a number of shortcomings in both the original BLAS interface and present-day BLAS implementations. The framework allows developers to rapidly instantiate high-performance BLAS-like libraries on existing and new architectures with relatively little effort. The key to this achievement is the observation that vir...

متن کامل

Level and BLAS in the NAG C Library

This report describes a set of matrix vector routines Level BLAS and matrix matrix routines Level BLAS written in C These routines have been included in Mark of the NAG C Library and are used by other library routines in that library Details are given of the implementation testing and use of the routines and a complete listing of all the ANSI C function prototypes is included in the Appendix Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1995

Parallelizing a level 3 BLAS library for LAN-connected workstations

نویسندگان

چکیده

منابع مشابه

Porting and Measuring the Linpack Benchmark on Gamma

ScaLAPACK Tutorial

Illinois-Intel Multithreading Library: Multithreading Support for Intel Architecture Based Multiprocessor Systems

BLIS: A Modern Alternative to the BLAS

Level and BLAS in the NAG C Library

عنوان ژورنال:

اشتراک گذاری