multi gpu

Multi-level parallelism for incompressible flow computations on GPU clusters

Journal: :Parallel Computing 2013

Dana Jacobsen Inanc Senocak

We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-g...

متن کامل

Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

2016

Fan Zhang Guojun Li Wei Li Wei Hu Yuxin Hu

With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image proces...

متن کامل

Fastplay-A Parallelization Model and Implementation of SMC on CUDA based GPU Cluster Architecture

Journal: :IACR Cryptology ePrint Archive 2011

Shi Pu Pu Duan Jyh-Charn Liu

We propose a four-tiered parallelization model for acceleration of the secure multiparty computation (SMC) on the CUDA based Graphic Processing Unit (GPU) cluster architecture. Specification layer is the top layer, which adopts the SFDL of Fairplay for specification of secure computations. The SHDL file generated by the SFDL compiler of Fairplay is used as inputs to the function layer, for whic...

متن کامل

Multi-GPU maximum entropy image synthesis for radio astronomy

Journal: :CoRR 2017

Miguel Cárcamo Pablo E. Román Simon Casassus Victor Moral Fernando R. Rannou

The maximum entropy method (MEM) is a well known deconvolution technique in radio-interferometry. This method solves a non-linear optimization problem with an entropy regularization term. Other heuristics such as CLEAN are faster but highly user dependent. Nevertheless, MEM has the following advantages: it is unsupervised, it has a statistical basis, it has a better resolution and better image ...

متن کامل

Parallel Computations for Hierarchical Agglomerative Clustering using CUDA Fast and Scalable Computations on Graphics Processors

2014

S. A. Arul Shalom Manoranjan Dash

Graphics Processing Units (GPU) in today’s desktops can well be thought of as a high performance parallel processor. Traditionally, parallel computing is the usage of multiple computing resources to execute computational problems simultaneously. Such computations are possible using multi-core CPUs or computers with multiple CPUs or by using a network of computers in parallel. Today’s GPUs are c...

متن کامل

Fast Cellular Automata Implementation on Graphic Processor Unit (GPU) for Salt and Pepper Noise Removal

Journal: Journal of Computer and Robotics 2014

Afsaneh Jalalian, Babak Karasfi, Khairulmizam Samsudin M.Iqbal Saripan Syamsiah Mashohor

Noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. A common problem in imaging systems by using CMOS or CCD sensors is appearance of the salt and pepper noise. This paper presents Cellular Automata (CA) framework for noise removal of distorted image by the salt an...

متن کامل

GPU Acceleration of Particle-based Volume Rendering using CUDA

2008

Ding Zhongming Naohisa Sakamoto Yasuo Ebara Koji Koyamada

In this paper, we apply Particle-based Volume Rendering (PBVR) technique using a current programmable GPU architecture. Recently, the increasing programmability of GPU offers an efficient method of SIMD parallel algorithm to solve the speed problem. Due to the each point or pixel can be calculated independently, we use programmable graphics hardware to delegate all expensive rendering tasks to ...

متن کامل

Architecting the finite element method pipeline for the GPU

Journal: :Journal of computational and applied mathematics 2014

Zhisong Fu T. James Lewis Robert Michael Kirby Ross T. Whitaker

The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core stream...

متن کامل

Extensions and Limitations of the Neural GPU

Journal: :CoRR 2016

Eric Price Wojciech Zaremba Ilya Sutskever

The Neural GPU is a recent model that can learn algorithms such as multi-digit binary addition and binary multiplication in a way that generalizes to inputs of arbitrary length. We show that there are two simple ways of improving the performance of the Neural GPU: by carefully designing a curriculum, and by increasing model size. The latter requires a memory efficient implementation, as a naive...

متن کامل

High-accuracy Optimization by Parallel Iterative Discrete Approximation and GPU Cluster Computing

Journal: :JSW 2014

Di Zhao

High-accuracy optimization is the key component of time-sensitive applications in computer sciences such as machine learning, and we develop single-GPU Iterative Discrete Approximation Monte Carlo Optimization (IDAMCS) and multi-GPU IDA-MCS in our previous research. However, because of the memory capability constrain of GPUs in a workstation, single-GPU IDA-MCS and multiGPU IDA-MCS may be in lo...

متن کامل