Mapping Streaming Applications to OpenCL

نویسنده

  • Abhishek Ray
چکیده

Graphic processing units (GPUs) have been gaining popularity in general purpose and high performance computing. A GPU is made up of a number of streaming multiprocessors (SM), each of which consists of many processing cores. A large number of general-purpose applications have been mapped onto GPUs efficiently. Stream processing applications, however, exhibit properties such as unfavorable data movement patterns and low computation-tocommunication ratio that might lead to a poor performance on a GPU. We describe the automated mapping framework developed earlier that maps most stream processing applications onto NVIDIA GPUs efficiently by taking into account its architectural characteristics. We then discuss the implementation details of porting the mapping framework to OpenCL running on AMD GPUs and evaluate the performance of the mapping framework by running several benchmarks. Performance between the generated CUDA and OpenCL code is compared based on different heuristics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Workload distribution and balancing in FPGAs and CPUs with OpenCL and TBB

In this paper we evaluate the performance and energy effectiveness of FPGA and CPU devices for a kind of parallel computing applications in which the workload can be distributed in a way that enables simultaneous computing in addition to simple off loading. The FPGA device is programmed via OpenCL using the recent availability of commercial tools and hardware while Threading Building Blocks (TB...

متن کامل

Methods for Optimizing OpenCL Applications on Heterogeneous Multicore Architectures

Heterogeneous multicore architectures with CPU and add-on GPUs or streaming processors are now widely used in computer systems. These GPUs provide substantially more computation capability and memory bandwidth compared to traditional multi-cores. Also, because they are highly programmable, they provide the computational performance needed for realistic graphics rendering. Applications with gene...

متن کامل

Operating System Support for Fine-grained Pipeline Parallelism on Heterogeneous Multicore Accelerators

On-chip special-purpose accelerators are a promising technique in the achievement of high-performance and energy-efficient computing. In particular, fine-grained pipelined execution with multicore accelerators is suitable for streaming applications such as JPEG decoders, which consist of a series of different tasks and process streaming data. CPUs that assign each task to appropriate accelerato...

متن کامل

Comparison of OpenMP & OpenCL Parallel Processing Technologies

This paper presents a comparison of OpenMP and OpenCL based on the parallel implementation of algorithms from various fields of computer applications. The focus of our study is on the performance of benchmark comparing OpenMP and OpenCL. We observed that OpenCL programming model is a good option for mapping threads on different processing cores. Balancing all available cores and allocating suff...

متن کامل

Adas on Cots with OpenCL: A Case Study with Lane Detection

The concept of autonomous cars is driving a boost for car electronics and the size of automotive semiconductor market is foreseen to double by 2025. How to benefit from this boost is an interesting question. This article presents a case study to test the feasibility of using OpenCL as the programming language and COTS components as the underlying platforms for ADAS development. For representati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012