operands

GEMM-Like Convolution for Deep Learning Inference on the Xilinx Versal

Journal: :Lecture Notes in Computer Science 2023

We revisit a blocked formulation of the direct convolution algorithm that mimics modern realizations general matrix multiplication (gemm), demonstrating same approach can be adapted to deliver high performance for deep learning inference tasks on AI Engine (AIE) tile embedded in Xilinx Versal platforms. Our experimental results VCK190 shows an arithmetic throughput close 70% theoretical peak AI...

متن کامل

Systolic Hardware Implementation for the Montgomery Modular Multiplication

2002

NADIA NEDJAH

Modular multiplication is a cornerstone computation in public-key cryptography systems such as RSA cryptosystem. The operation is time consuming for large operands. This paper describes the characteristics of a systolic array-based architecture to implement modular multiplication using the fast Montgomery algorithm. The paper evaluates the prototype using the time×area classic factor. Key-Words...

متن کامل

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers

2000

Bogdan Matasaru Tudor Jebelean

We present the FPGA implementation of an extension of the binary plus–minus systolic algorithm which computes the GCD (greatest common divisor) and also the normal form of a rational number, without using division. A sample array for 8 bit operands consumes 83.4% of an Atmel 40K10 chip and operates at 25 MHz.

متن کامل

Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

Journal: :IEEE Access 2022

Processing-in-Memory (PIM) has been actively studied to overcome the memory bottleneck by placing computing units near or in memory, especially for efficiently processing low locality data-intensive applications. We can categorize in-DRAM PIMs depending on how many banks perform PIM computation one DRAM command: per-bank and all-bank. The operates only bank, delivering performance but preservin...

متن کامل

Compiler-assisted Hybrid Operand Communication

2009

Dong Li Behnam Robatmili Madhu Saravana Sibi Govindan Aaron Smith Steve Keckler Doug Burger

Communication of operands among in-flight instructions can be power intensive, especially in superscalar processors where all result tags are broadcast to a small number of consumers through a multi-entry CAM. Token-based point-to-point communication of operands in dataflow architectures is highly efficient when each produced token has only one consumer, but inefficient when there are many cons...

متن کامل

Combined Multiplication and Sum-of-Squares Units

2003

Michael J. Schulte Louis Marquette Shankar Krithivasan E. George Walters C. John Glossner

Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sum-of-squares computations, A2 + B2, based on an input control signal. Compared to conventional parallel multipliers, these units have a modest increase in area and delay, but allow either multiplica...

متن کامل

An experimental Investigation of Single and Multiple Issue ILP speedup for stack-based code

2000

Chris Bailey Mike Weeks

Stack-processors, which abandon register files and instead work directly on stack-resident operands, have recently enjoyed a resurgence of interest in conjunction with developments such as Java, and have always retained an interest for FORTH users, especially in realtime systems arenas. However, the opportunity to enhance throughput of stack-based architectures has received insignificant attent...

متن کامل

Discrete particle swarm optimization for the team orienteering problem

Journal: :Memetic Computing 2011

Shanthi Muthuswamy Sarah S. Lam

In this paper, a novel discrete particle swarm optimization (PSO) algorithm is proposed to solve the team orienteering problem (TOP). Discrete evaluation is achieved by redefining all operators and operands used in PSO. To obtain better results, a strengthened PSO, which improves both exploration and exploitation during the search process, is employed. Our algorithm achieves the best known solu...

متن کامل

Implementation of Fault Attacks on Elliptic Curve Cryptosystems

2015

Anubhav Saxena Varun Prakash Saxena Sandip Mal

The main motivation behind Elliptic Curve Cryptography is to find a Public Key Family which provides the same level of security as Discrete Log Systems or RSA but with shorter operands. Through Fault Attacks, the adversary disturbs the computation of Cryptographic device to obtain information about Secret Key. This paper uses Elliptic Curve Point Multiplication Algorithm based on a binary seque...

متن کامل

Faster Modulo 2 + 1 Multipliers without Booth Recoding

2005

Ricardo Chaves Leonel Sousa

This paper proposes an improvement to the fastest modulo 2 + 1 multiplier already published, without Booth recoding. Results show that by manipulating the partial products and modulo reduction terms and by inserting them adequately in the multiplication matrix, the performance of multiplication units can be improved more than 20%. This improvement is obtained at the expense of some extra circui...

متن کامل