opencl

Automatic Generation of Optimized OpenCL Codes Using OCLoptimizer

Journal: :Comput. J. 2015

Jorge F. Fabeiro Diego Andrade Basilio B. Fraguela Ramón Doallo

The eruption of multicore processors and several kinds of accelerators has generalized the interest in parallel programming. The OpenCL standard is very appealing because it provides code portability across most of these platforms. It defines a programming model where a host code requests the execution of kernels in computational devices. Unfortunately, the host API of OpenCL is quite verbose, ...

متن کامل

Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator

Journal: :Parallel Computing 2013

Mark K. Gardner Paul Sathre Wu-chun Feng Gabriel Martinez

The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and their associated programming models. One of the most promising models for heterogeneous computing is the accelerator model, and one of the most cost-effective, high-performance accelerators currently available is the general-purpose, graphics processing unit (GPU). Two similar progra...

متن کامل

A New Compilation Path: From Python/NumPy to OpenCL

2011

Xunhao Li Rahul Garg José Nelson Amaral

Jit4OpenCL is a new compiler that converts scientific applications written in Python/NumPy into OpenCL code. This compiler is based on unPython, an ahead-of-time compiler from Python/Numpy to an intermediate form and OpenMP code, and on jit4GPU, a just-in-time compiler that converts that intermediate code into AMD CAL code that is specific for AMD GPUs. The targeting of OpenCL provides a new ev...

متن کامل

Implementation of Autoencoders with Systolic Arrays through OpenCL

Journal: :Electronics 2021

In the world of algorithm acceleration and implementation deep neural networks’ recall phase, OpenCL based solutions have a clear tendency to produce perfectly adapted kernels in graphic processor unit (GPU) architectures. However, they fail obtain same results when applied field-programmable gate array (FPGA) This situation, along with an enormous advance new GPU architectures, makes it unfeas...

متن کامل

High-Level Manipulation of OpenCL-Based Subvectors and Submatrices

2012

Karl Rupp

High-level C++ proxies for the convenient manipulation of subvectors and submatrices on OpenCL-enabled devices are introduced. It is demonstrated that the programming convenience of standard host-based code can be retained using native C++ language features only, even if massively parallel computing architectures such as graphics processing units are employed. The required modifications of the ...

متن کامل

Parallelization and Optimization of Feature Detection Algorithms on Embedded GPU

2013

Seung Heon Kang Seung-Jae Lee Kyu Park

In this paper, we parallelize and optimize the popular feature detection algorithms, i.e. SIFT and SURF, on the latest embedded GPU. Using conventional OpenGL shading language and recently developed OpenCL as the GPGPU software platforms, we compare the implementation efficiency and speed performance between each other as well as between GPU and CPU. Experimental result shows that implementatio...

متن کامل

OpenCL Implementation of NeuroIsing

Journal: :Progress of Theoretical Physics Supplement 2012

متن کامل

OpenCL 2.0 for FPGAs using OCLAcc

Journal: :CoRR 2015

Franz Richter-Gottfried Alexander Ditter Dietmar Fey

Designing hardware is a time-consuming and complex process. Realization of both, embedded and highperformance applications can benefit from a design process on a higher level of abstraction. This helps to reduce development time and allows to iteratively test and optimize the hardware design during development, as common in software development. We present our tool, OCLAcc, which allows the gen...

متن کامل

Efficient SIMD Vectorization for Hashing in OpenCL

2018

Tobias Behrens Viktor Rosenfeld Jonas Traub Sebastian Breß Volker Markl

Hashing is at the core ofmany efficient database operators such as hash-based joins and aggregations. Vectorization is a technique that uses Single Instruction Multiple Data (SIMD) instructions to process multiple data elements at once. Applying vectorization to hash tables results in promising speedups for build and probe operations. However, vectorization typically requires intrinsics – low-l...

متن کامل

Characterizing the Challenges and Evaluating the E cacy of a CUDA-to-OpenCL Translator

2013

Mark Gardner Paul Sathre Wu-chun Feng Gabriel Martinez

The proliferation of heterogeneous computing systems has led to increased interest in parallel architectures and their associated programming models. One of the most promising models for heterogeneous computing is the accelerator model, and one of the most cost-e↵ective, high-performance accelerators currently available is the general-purpose, graphics processing unit (GPU). Two similar program...

متن کامل