opencl

Mapping Streaming Applications to OpenCL

2012

Abhishek Ray

Graphic processing units (GPUs) have been gaining popularity in general purpose and high performance computing. A GPU is made up of a number of streaming multiprocessors (SM), each of which consists of many processing cores. A large number of general-purpose applications have been mapped onto GPUs efficiently. Stream processing applications, however, exhibit properties such as unfavorable data ...

متن کامل

Raising the Level of Abstraction of GPU-programming

2010

Ferosh Jacob Ritu Arora Purushotham Bangalore Marjan Mernik Jeffrey G. Gray

General-purpose computing on GPUs (graphics processing units) has received much attention lately due to the benefits of stream processing to exploit limitations of parallel processing. However, programming GPUs has several challenges with respect to the amount of effort spent in combining the kernel functional code of an application with the parallel concerns offered by APIs from various GPUs. ...

متن کامل

GPU Accelerated Real Time Rotation, Scale and Translation Invariant Image Registration Method

2012

Sudhakar Sah Jan Vanek YoungJun Roh Ratul Wasnik

This paper presents highly optimized implementation of image registration method that is invariant to rotation scale and translation. Image registration method using FFT works with comparable accuracy as similar methods proposed in the literature, but practical applications seldom use this technique because of high computational requirement. However, this method is highly parallelizable and off...

متن کامل

Teaching parallel programming to undergrads with hands-on experience

2013

Rainer Keller

This paper describes the didactic concept, the content and the lessons learned of a lecture on parallel programming for undergraduate students held during summer term 2013. The course’s focus was on providing hands-on experience, hence students were programming on real life codes using an actual HPC cluster. The lecture’s aim was to provide an in-depth understanding of each parallel programming...

متن کامل

Realtime scheduling using GPUs - proof of feasibility

2011

PETER FODREK

This paper will report our evaluation to use openCL as a platform for hard realtime scheduling. Specifically, we have evaluated which types of tasks are faster on GPGPU than on CPU. We have investigated computational tasks, memory intensive tasks (especially tasks using low latency GDDR memory) and disk intensive tasks. This study is the first part of a larger research program to design an inno...

متن کامل

Simulation of 1+1 dimensional surface growth and lattices gases using GPUs

Journal: :Computer Physics Communications 2011

Henrik Schulz Géza Ódor Gergely Ódor Máté Ferenc Nagy

Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1 dimensions related to the Asymmetric Simple Exclusion Process and show that for sizes, that fit into the shared...

متن کامل

Exploiting heterogeneous parallelism with the Heterogeneous Programming Library

Journal: :J. Parallel Distrib. Comput. 2013

Moisés Viñas Zeki Bozkus Basilio B. Fraguela

While recognition of the advantages of heterogeneous computing is steadily growing, the issues of programmability and portability hinder its exploitation. The introduction of the OpenCL standard was a major step forward in that it provides code portability, but its interface is even more complex than that of other approaches. In this paper we present the Heterogeneous Programming Library (HPL),...

متن کامل

An investigation of the performance portability of OpenCL

Journal: :J. Parallel Distrib. Comput. 2013

Simon J. Pennycook Simon D. Hammond Steven A. Wright J. A. Herdman I. Miller Stephen A. Jarvis

This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device type...

متن کامل

GPU isosurface raycasting of FCC datasets

Journal: :Graphical Models 2013

Minho Kim

This paper presents an efficient and accurate isosurface rendering algorithm for the natural C splines on the facecentered cubic (FCC) lattice. Leveraging fast and accurate evaluation of a spline field and its gradient, accompanied by efficient empty-space skipping, the approach generates high-quality isosurfaces of FCC datasets at interactive speed (20–70 fps). The pre-processing computation (...

متن کامل

Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems

2013

Stefan Breuer Michel Steuwer Sergei Gorlatch

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA, which makes it a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high level of programming abstractio...

متن کامل