نتایج جستجو برای: operands

تعداد نتایج: 843  

Journal: :The Journal of Supercomputing 2021

We introduce a high performance, multi-threaded realization of the gemm kernel for ARMv8.2 architecture that operates with 16-bit (half precision)/queryKindly check and confirm whether corresponding author is correctly identified. floating point operands. Our code especially designed efficient machine learning inference (and to certain extent, also training) deep neural networks. The results on...

Journal: :Electronics 2021

Multiplication is an essential image processing operation commonly implemented in hardware DSP cores. To improve cores’ area, speed, or energy efficiency, we can approximate multiplication. We present multiplier that generates two partial products using hybrid radix-4 and logarithmic encoding of the input operands. It uses exact to generate product from three most significant bits approximation...

Journal: :IET Information Security 2007
Haining Fan Jia-Guang Sun Ming Gu Kwok-Yan Lam

We describe how a simple way to split input operands allows for fast VLSI implementations of subquadratic GF (2)[x] Karatsuba-Ofman multipliers. The theoretical XOR gate delay of the resulting multipliers is reduced significantly. For example, it is reduced by about 33% and 25% for n = 2 and n = 3 (t > 1), respectively. To the best of our knowledge, this parameter has never been improved since ...

2004
Eduardo Bonelli Adriana Compagnoni Ricardo Medel

2 SIFTAL 4 2.1 Syntax of SIFTAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Type System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1 Typing Basic Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Typing Operands, Word Values and Heap Values . . . ...

2010
Miloslav Trmac Adam Husár Jan Hranac Tomás Hruska Karel Masarík

We describe an automated way to generate data for a practical LLVM instruction selector based on machine-generated description of the target architecture at register transfer level. The generated instruction selector can handle arbitrarily complex machine instructions with no internal control flow, and can automatically find and take advantage of arithmetic properties of an instructions, specia...

1996
Olav Beckmann Paul H J Kelly

This short paper describes a matrix-vector library implementation running on the Fujitsu AP1000. The library optimises data distribution at run-time, taking advantage of information about how operands and results are used by delaying evaluation where possible. The work extends our earlier paper on the subject 5] by giving a general methodology for representing data distributions, which is then ...

2004
Alkis Evlogimenos

Linear scan register allocation is a fast global register allocation first presented in [PS99] as an alternative to the more widely used graph coloring approach. In this paper, I apply the linear scan register allocation algorithm in a system with SSA form and show how to improve the algorithm by taking advantage of lifetime holes and memory operands, and also eliminate the need for reserving r...

2003
Hiroshi Takamura Koji Inoue Vasily G. Moshnyaga

This paper proposes an approach for reducing access count to register-files based on operand data reuse. The key idea is to compare source and destination operands of the current and previous instructions and if they are the same, omit the corresponding register file activation during operand fetch, thus saving energy consumption. Simulations show that using this technique we can decrease the t...

Journal: :CoRR 2013
Paul Tarau

The tree based representation described in this paper, hereditarily binary numbers, applies recursively a run-length compression mechanism that enables computations limited by the structural complexity of their operands rather than by their bitsizes. While within constant factors from their traditional counterparts for their worst case behavior, our arithmetic operations open the doors for inte...

Journal: :J. Inf. Sci. Eng. 2001
Yu-Wei Chen Kuo-Liang Chung

Consider an n-dimensional SIMD hypercube Hn with 3n/2 − 1 faulty nodes. Given 2 operands, this paper presents an efficient algorithm for prefix computation on the faulty Hn. Employing the newly proposed delay-update technique and the subcube partition scheme, the proposed algorithm takes n+5logn+7 steps, and it tolerates n/2 more faulty nodes than does Raghavendra and Sridhar’s algorithm [4...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید