Matrix-vector multiplication: Parallel algorithms and architectures

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Graph Algorithms with In-database Matrix-Vector Multiplication

Graph problems are significantly harder to solve with large graphs residing on disk compared to main memory only. In this work, we study how to solve four important graph problems: reachability from a source vertex, single source shortest path, weakly connected components, and PageRank. It is well known that the aforementioned algorithms can be expressed as an iteration of matrix-vector multipl...

متن کامل

Fast Matrix Multiplication Algorithms on Mimd Architectures

Sequential fast matrix multiplication algorithms of Strassen and Winograd are studied; the complexity bound given by Strassen is improved. These algorithms are parallelized on MIMD distributed memory architectures of ring and torus topologies; a generalization to a hyper-torus is also given. Complexity and efficiency are analyzed and good asymptotic behaviour is proved. These new parallel algor...

متن کامل

Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallel...

متن کامل

SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision

We accelerate a double precision sparse matrix and DD vector multiplication (DD-SpMV), and its transposition and DD vector multiplication (DD-TSpMV) by using SIMD AVX2 for Krylov subspace methods. We compare some storage formats of DD-SpMV and DDTSpMV for AVX2 to eliminate performance degradation factors in CRS. Our experience indicates that BCRS4x1, with fitting block size to the SIMD register...

متن کامل

Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms

Extra memory allows parallel matrix multiplication to be done with asymptotically less communication than Cannon’s algorithm and be faster in practice. “3D” algorithms arrange the p processors in a 3D array, and store redundant copies of the matrices on each of p layers. ‘2D” algorithms such as Cannon’s algorithm store a single copy of the matrices on a 2D array of processors. We generalize the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computers & Mathematics with Applications

سال: 1988

ISSN: 0898-1221

DOI: 10.1016/0898-1221(88)90262-3