Screaming fast Galois field arithmetic using intel SIMD instructions
نویسندگان
چکیده
Galois Field arithmetic forms the basis of Reed-Solomon and other erasure coding techniques to protect storage systems from failures. Most implementations of Galois Field arithmetic rely on multiplication tables or discrete logarithms to perform this operation. However, the advent of 128-bit instructions, such as Intel’s Streaming SIMD Extensions, allows us to perform Galois Field arithmetic much faster. This short paper details how to leverage these instructions for various field sizes, and demonstrates the significant performance improvements on commodity microprocessors. The techniques that we describe are available as open source software.
منابع مشابه
Effective method for coding and decoding RS codes using SIMD instructions
a method is introduced for efficient encoding and decoding of the Reed-Solomon codes based on the matrix formalism. In this ideology, a method is suggested for vectorization of the Berlekamp-Massey algorithm for detecting and correcting several silent data corruptions. The results of comparison of suggested method with other knows ways of decoding RS codes are presented. This approach requires ...
متن کاملAn Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors
In the present paper, an implementation of a parallel one-dimensional fast Fourier transform (FFT) using Streaming SIMD Extensions 3 (SSE3) instructions on dual-core processors is proposed. Combination of vectorization and the block six-step FFT algorithm is shown to effectively improve performance. The performance results for one-dimensional FFTs on dual-core Intel Xeon processors are reported...
متن کاملFPL Implementation of a SIMD RISC RNS-Enabled DSP
VHDL synthesis and FPL implementation of a RNS-enabled RISC DSP are presented in this paper. Four parallel modular arithmetic units optimized for multiply-and-accumulate are used in a parallel SIMD architecture. The moduli 256, 251, 241 and 239 are selected to optimize area and performance. Thus, pipelined Galois Field multipliers are used for prime moduli while conventional adders and multipli...
متن کاملAutomatic Generation of Vectorized Fast Fourier Transform Libraries for the Larrabee and AVX Instruction Set Extension
Introduction The discrete Fourier transform (DFT) and its fast algorithms (fast Fourier transforms or FFTs) are among the most important computational building blocks in signal processing and scientific computing. Consequently, there is a number of high performance DFT libraries available including Intel’s Integrated Performance Primitives (IPP), FFTW [6], and libraries generated by Spiral [9, ...
متن کاملFaster Population Counts Using AVX2 Instructions
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g., popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated inst...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013