نتایج جستجو برای: opencl

تعداد نتایج: 807  

Journal: :CoRR 2017
Cedric Nugteren

This work demonstrates how to accelerate dense linear algebra computations using CLBlast, an open-source OpenCL BLAS library providing optimized routines for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, astrophysi...

2016
Anshuman Verma Ahmed E. Helal Konstantinos Krommydas Wu-Chun Feng

For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance comes at the expense of programmability. FPGA developers use hardware design languages (HDLs) to implement ...

2011
Peter Thoman Klaus Kofler Heiko Studt John Thomson Thomas Fahringer

The OpenCL standard allows targeting a large variety of CPU, GPU and accelerator architectures using a single unified programming interface and language. While the standard guarantees portability of functionality for complying applications and platforms, performance portability on such a diverse set of hardware is limited. Devices may vary significantly in memory architecture as well as type, n...

2015
Michael Haidl Bastian Hagedorn Sergei Gorlatch

Systems that comprise accelerators (e.g., GPUs) promise high performance, but their programming is still a challenge, mainly because of two reasons: 1) two distinct programming models have to be used within an application: one for the host CPU (e.g., C++), and one for the accelerator (e.g., OpenCL or CUDA); 2) using Just-In-Time (JIT) compilation and its optimization opportunities in both OpenC...

Journal: :International Journal of Networked and Distributed Computing 2017

Journal: :CoRR 2015
Shaodong Qin Mladen Berekovic

Modern SoC-FPGA that consists of FPGA with embedded ARM cores is being popularized as an embedded vision system platform. However, the design approach of SoCFPGA applications still follows traditional hardware-software separate workflow, which becomes the barrier of rapid product design and iteration on SoC-FPGA. High-Level Synthesis (HLS) and OpenCL-based system-level design approaches provide...

Journal: :Computer Physics Communications 2012
Ján Busa Shura Hayryan Ming-Chya Wu Ján Busa Chin-Kun Hu

Introduction of Graphical Processing Units (GPUs) and computing using GPUs in recent years opened possibilities for simple parallelization of programs. In this update, we present the modernized version of program ARVO [J. Buša, J. Dzurina, E. Hayryan, S. Hayryan, C.-K. Hu, J. Plavka, I. Pokorný, J. Skivánek, M.-C. Wu, Comput. Phys. Comm. 165 (2005) 59]. The whole package has been rewritten in t...

2016
Amir Momeni Hamed Tabkhi Gunar Schirner David Kaeli

OpenCL support across many heterogeneous nodes (FPGAs, GPUs, CPUs) has increased the programmability of these systems significantly. At the same time, it opens up new challenges and design choices for system designers and application programmers. While OpenCL offers a universal semantic to capture the parallel behavior of applications independent of the target architecture, some customization s...

2013
Lars Schor Andreas Tretter Lothar Thiele

Upcoming heterogeneous systems ask for new programming paradigms. Abstracting the underlying hardware architecture is desirable in order to support productive software development. This thesis proposes a design flow and runtime-system for executing process networks on heterogeneous systems using OpenCL. Process networks are a popular model of computation for deterministic parallel programming a...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید