Code Generators for Automatic Tuningof Numerical Kernels : Experiences with FFTWPosition

نویسندگان

  • Richard Vuduc
  • James W. Demmel
چکیده

Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning by hand. In response, a number automatic tuning systems have been developed which typically operate by (1) generating multiple implementations of a kernel, and (2) empirically selecting an optimal implementation. One such system is FFTW (Fastest Fourier Transform in the West) for the discrete Fourier transform. In this paper, we review FFTW's inner workings with an emphasis on its code generator, and report on our empirical evaluation of the system on two diierent hardware and compiler platforms. We then describe a number of our own extensions to the FFTW code generator that compute eecient discrete cosine transforms and show promising speed-ups over a vendor-tuned library. We also comment on current opportunities to develop tuning systems in the spirit of FFTW for other widely-used kernels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generators for Automatic Tuningof Numerical Kernels : Experiences with FFTWPosition

Achieving peak performance in important numerical kernels such as dense matrix multiply or sparse-matrix vector multiplication usually requires extensive, machine-dependent tuning by hand. In response, a number automatic tuning systems have been developed which typically operate by (1) generating multiple implementations of a kernel, and (2) empirically selecting an optimal implementation. One ...

متن کامل

Automatic Generation and Adaptation of Numerical Kernels

Designing software that achieves peak performance on modern architectures is a difficult, expensive and often highly platform specific task. In this paper we discuss recent automatic adaptive optimization approaches to high-performance programming: ATLAS, FFTW, and SPIRAL. They are designed to eliminate hand-coding and hand-tuning for various numerical kernels. Further, we describe our own work...

متن کامل

Automatic Generation of Sparse Tensor Kernels with Workspaces

Recent advances in compiler theory describe how to compile sparse tensor algebra. Prior work, however, does not describe how to generate efficient code that takes advantage of temporary workspaces. These are often used to hand-optimize important kernels such as sparse matrix multiplication and the matricized tensor times Khatri-Rao product. Without this capability, compilers and code generators...

متن کامل

Automatic Coarse-grain Partitioning and Automatic Code Generation for Heterogeneous Architectures

Real-time signal, image, and control applications have very important time constraints, involving the use of several powerful numerical calculation units. The aim of our work is to develop a fast and automatic prototyping process dedicated to parallel architectures made of both PC and several last generation Texas Instruments digital signal processors: TMS320C6X DSP. The process is based on Syn...

متن کامل

Adjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational Finance

We demonstrate the flexibility and ease of use of C++ algorithmic differentiation (AD) tools based on overloading to numerical patterns (kernels) arising in computational finance. While adjoint methods and AD have been known in the finance literature for some time, there are few tools capable of handling and integrating with the C++ codes found in production. Adjoint methods are also known to b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006