نتایج جستجو برای: opencl

تعداد نتایج: 807  

2012
Abhishek Ray

Graphic processing units (GPUs) have been gaining popularity in general purpose and high performance computing. A GPU is made up of a number of streaming multiprocessors (SM), each of which consists of many processing cores. A large number of general-purpose applications have been mapped onto GPUs efficiently. Stream processing applications, however, exhibit properties such as unfavorable data ...

2010
Ferosh Jacob Ritu Arora Purushotham Bangalore Marjan Mernik Jeffrey G. Gray

General-purpose computing on GPUs (graphics processing units) has received much attention lately due to the benefits of stream processing to exploit limitations of parallel processing. However, programming GPUs has several challenges with respect to the amount of effort spent in combining the kernel functional code of an application with the parallel concerns offered by APIs from various GPUs. ...

2012
Sudhakar Sah Jan Vanek YoungJun Roh Ratul Wasnik

This paper presents highly optimized implementation of image registration method that is invariant to rotation scale and translation. Image registration method using FFT works with comparable accuracy as similar methods proposed in the literature, but practical applications seldom use this technique because of high computational requirement. However, this method is highly parallelizable and off...

2013
Rainer Keller

This paper describes the didactic concept, the content and the lessons learned of a lecture on parallel programming for undergraduate students held during summer term 2013. The course’s focus was on providing hands-on experience, hence students were programming on real life codes using an actual HPC cluster. The lecture’s aim was to provide an in-depth understanding of each parallel programming...

2011
PETER FODREK

This paper will report our evaluation to use openCL as a platform for hard realtime scheduling. Specifically, we have evaluated which types of tasks are faster on GPGPU than on CPU. We have investigated computational tasks, memory intensive tasks (especially tasks using low latency GDDR memory) and disk intensive tasks. This study is the first part of a larger research program to design an inno...

Journal: :Computer Physics Communications 2011
Henrik Schulz Géza Ódor Gergely Ódor Máté Ferenc Nagy

Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1 dimensions related to the Asymmetric Simple Exclusion Process and show that for sizes, that fit into the shared...

Journal: :J. Parallel Distrib. Comput. 2013
Moisés Viñas Zeki Bozkus Basilio B. Fraguela

While recognition of the advantages of heterogeneous computing is steadily growing, the issues of programmability and portability hinder its exploitation. The introduction of the OpenCL standard was a major step forward in that it provides code portability, but its interface is even more complex than that of other approaches. In this paper we present the Heterogeneous Programming Library (HPL),...

Journal: :J. Parallel Distrib. Comput. 2013
Simon J. Pennycook Simon D. Hammond Steven A. Wright J. A. Herdman I. Miller Stephen A. Jarvis

This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device type...

Journal: :Graphical Models 2013
Minho Kim

This paper presents an efficient and accurate isosurface rendering algorithm for the natural C splines on the facecentered cubic (FCC) lattice. Leveraging fast and accurate evaluation of a spline field and its gradient, accompanied by efficient empty-space skipping, the approach generates high-quality isosurfaces of FCC datasets at interactive speed (20–70 fps). The pre-processing computation (...

2013
Stefan Breuer Michel Steuwer Sergei Gorlatch

The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA, which makes it a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high level of programming abstractio...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید