نتایج جستجو برای: operands
تعداد نتایج: 843 فیلتر نتایج به سال:
We revisit a blocked formulation of the direct convolution algorithm that mimics modern realizations general matrix multiplication (gemm), demonstrating same approach can be adapted to deliver high performance for deep learning inference tasks on AI Engine (AIE) tile embedded in Xilinx Versal platforms. Our experimental results VCK190 shows an arithmetic throughput close 70% theoretical peak AI...
Modular multiplication is a cornerstone computation in public-key cryptography systems such as RSA cryptosystem. The operation is time consuming for large operands. This paper describes the characteristics of a systolic array-based architecture to implement modular multiplication using the fast Montgomery algorithm. The paper evaluates the prototype using the time×area classic factor. Key-Words...
We present the FPGA implementation of an extension of the binary plus–minus systolic algorithm which computes the GCD (greatest common divisor) and also the normal form of a rational number, without using division. A sample array for 8 bit operands consumes 83.4% of an Atmel 40K10 chip and operates at 25 MHz.
Processing-in-Memory (PIM) has been actively studied to overcome the memory bottleneck by placing computing units near or in memory, especially for efficiently processing low locality data-intensive applications. We can categorize in-DRAM PIMs depending on how many banks perform PIM computation one DRAM command: per-bank and all-bank. The operates only bank, delivering performance but preservin...
Communication of operands among in-flight instructions can be power intensive, especially in superscalar processors where all result tags are broadcast to a small number of consumers through a multi-entry CAM. Token-based point-to-point communication of operands in dataflow architectures is highly efficient when each produced token has only one consumer, but inefficient when there are many cons...
Multiplication and squaring are important operations in digital signal processing and multimedia applications. This paper presents designs for units that implement either multiplication, A × B, or sum-of-squares computations, A2 + B2, based on an input control signal. Compared to conventional parallel multipliers, these units have a modest increase in area and delay, but allow either multiplica...
Stack-processors, which abandon register files and instead work directly on stack-resident operands, have recently enjoyed a resurgence of interest in conjunction with developments such as Java, and have always retained an interest for FORTH users, especially in realtime systems arenas. However, the opportunity to enhance throughput of stack-based architectures has received insignificant attent...
In this paper, a novel discrete particle swarm optimization (PSO) algorithm is proposed to solve the team orienteering problem (TOP). Discrete evaluation is achieved by redefining all operators and operands used in PSO. To obtain better results, a strengthened PSO, which improves both exploration and exploitation during the search process, is employed. Our algorithm achieves the best known solu...
The main motivation behind Elliptic Curve Cryptography is to find a Public Key Family which provides the same level of security as Discrete Log Systems or RSA but with shorter operands. Through Fault Attacks, the adversary disturbs the computation of Cryptographic device to obtain information about Secret Key. This paper uses Elliptic Curve Point Multiplication Algorithm based on a binary seque...
This paper proposes an improvement to the fastest modulo 2 + 1 multiplier already published, without Booth recoding. Results show that by manipulating the partial products and modulo reduction terms and by inserting them adequately in the multiplication matrix, the performance of multiplication units can be improved more than 20%. This improvement is obtained at the expense of some extra circui...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید