Generators for Automatic Tuningof Numerical Kernels : Experiences with FFTWPosition

نویسندگان

  • Richard Vuduc
  • James W. Demmel
چکیده

Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning by hand. In response, a number automatic tuning systems have been developed which typically operate by (1) generating multiple implementations of a kernel, and (2) empirically selecting an optimal implementation. One such system is FFTW (Fastest Fourier Transform in the West) for the discrete Fourier transform. In this paper, we review FFTW's inner workings with an emphasis on its code generator, and report on our empirical evaluation of the system on two diierent hardware and compiler platforms. We then describe a number of our own extensions to the FFTW code generator that compute eecient discrete cosine transforms and show promising speed-ups over a vendor-tuned library. We also comment on current opportunities to develop tuning systems in the spirit of FFTW for other widely-used kernels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Code Generators for Automatic Tuningof Numerical Kernels : Experiences with FFTWPosition

Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning by hand. In response, a number automatic tuning systems have been developed which typically operate by (1) generating multiple implementations of a kernel, and (2) empirically selecting an optimal implementation. One ...

متن کامل

CAS WAVELET METHOD FOR THE NUMERICAL SOLUTION OF BOUNDARY INTEGRAL EQUATIONS WITH LOGARITHMIC SINGULAR KERNELS

In this paper, we present a computational method for solving boundary integral equations with loga-rithmic singular kernels which occur as reformulations of a boundary value problem for the Laplacian equation. Themethod is based on the use of the Galerkin method with CAS wavelets constructed on the unit interval as basis.This approach utilizes the non-uniform Gauss-Legendre quadrature rule for ...

متن کامل

Automatic Generation and Adaptation of Numerical Kernels

Designing software that achieves peak performance on modern architectures is a difficult, expensive and often highly platform specific task. In this paper we discuss recent automatic adaptive optimization approaches to high-performance programming: ATLAS, FFTW, and SPIRAL. They are designed to eliminate hand-coding and hand-tuning for various numerical kernels. Further, we describe our own work...

متن کامل

Automatic Generation of Sparse Tensor Kernels with Workspaces

Recent advances in compiler theory describe how to compile sparse tensor algebra. Prior work, however, does not describe how to generate efficient code that takes advantage of temporary workspaces. These are often used to hand-optimize important kernels such as sparse matrix multiplication and the matricized tensor times Khatri-Rao product. Without this capability, compilers and code generators...

متن کامل

Early Experiences Porting Scientific Applications to the Many Integrated Core (MIC) Platform

This paper presents experiences using an early development environment release of the forthcoming Intel MIC platform, focusing on porting of existing scientific applications and micro-kernels. Fortran and C++ applications are chosen from disciplines including quantum mechanics, hypersonics, rarefied gas dynamics, finite-element analysis, and FFT and linear algebra kernels used in the direct num...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007