Synthesis of a unidirectional systolic array for matrix–vector multiplication
نویسندگان
چکیده
منابع مشابه
Synthesis of a Systolic Array Genetic Algorithm
The paper presents the design of a hardware genetic algorithm which uses a pipeline of systolic arrays. Demostrated is the design methodology, where a simple genetic algorithm expressed in C source code is progressivly re-written into a recurrence form from which systolic structures can be deduced. The paper extends previous work by the authors by introducing a simplification to a previous syst...
متن کامل2D matrix multiplication on a 3D systolic array
The introduction of systolic arrays in the late 1970s had an enormous impact on the area of special purpose computing. However, most of the work so far has been done with onedimensional and two-dimensional (2D) systolic arrays. Recent advances in three-<limensional VLSI (3D VLSI) and 3D packaging of2D VLSI components, has made the idea of 3D systolic arrays feasible in the near future. In this ...
متن کاملA Systolic Architecture for Modulo Multiplication
With the current advances in VLSI technology, traditional algorithms for Residue Number System (RNS) based architectures should be reevaluated to explore the new technology dimensions. In this brief, we introduce A @(log n ) algorithm for large moduli multiplication for RNS based architectures. A systolic array has been designed to perform the modulo multiplication Algorithm. The proposed modul...
متن کاملHyper-Systolic Matrix Multiplication
A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle matrix-vector multiplications as well as transposed matrix products.
متن کاملTwo systolic architectures for modular multiplication
This article presents two systolic architectures to speed up the computation of modular multiplication in RSA cryptosystems. In the double-layer architecture, the main operation of Montgomery's algorithm is partitioned into two parallel operations after using the precomputation of the quotient bit. In the non-interlaced architecture, we eliminate the one-clock-cycle gap between iterations by pa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematical and Computer Modelling
سال: 2006
ISSN: 0895-7177
DOI: 10.1016/j.mcm.2005.11.009