نتایج جستجو برای: multi gpu

تعداد نتایج: 473736  

Journal: :CoRR 2017
Adam Stooke Pieter Abbeel

We present Synkhronos, an extension to Theano for multi-GPU computations leveraging data parallelism. Our framework provides automated execution and synchronization across devices, allowing users to continue to write serial programs without risk of race conditions. The NVIDIA Collective Communication Library is used for high-bandwidth inter-GPU communication. Further enhancements to the Theano ...

2013
Marlon Arce Acuna Takayuki Aoki

Tsunamis are natural disasters that represent a real and dangerous threat specially to countries with coasts along the Pacific Ocean. At the light of the tragic events of the 2011 Earthquake and Tsunami in Japan the importance of predicting this phenomenon has gained great relevance. In order to simulate a Tsunami the Shallow Water Equations (SWE) are used, these equations although reliable can...

2011
Sahil Suneja Elliott Baron Ryan Johnson

Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to become commonplace in the market. As seen recently from the high performance computing community, leveraging a GPU can yield performance increases of several orders of magnitude. We propose using GPU acceleration to greatly speed up cloud management tasks in VMMs. This is only becoming possible now th...

2014
Sadaf Alam Ugo Varetto

This report introduces hybrid implementation of the Gromacs application, and provides instructions on building and executing on PRACE prototype platforms with Grahpical Processing Units (GPU) and Many Intergrated Cores (MIC) accelerator technologies. GROMACS currently employs message-passing MPI parallelism, multi-threading using OpenMP and contains kernels for non-bonded interactions that are ...

2011
Pablo Quesada-Barriuso Julián Lamas-Rodríguez Dora B. Heras Montserrat Bóo Francisco Argüello

Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, as a coarse-g...

2014
Michael Benguigui Françoise Baude

This article presents a multi GPU adaptation of a specific Monte Carlo and classification based method for pricing American basket options, due to Picazo [1]. The first part relates how to combine fine and coarse grained parallelization to price American basket options. In order to benefit from different GPU devices, a dynamic strategy of kernel calibration is proposed, and contributes to the d...

2009
Wangda Zuo Qingyan Chen

Computational fluid dynamics (CFD) can provide detailed information of flow motion, temperature distributions and species dispersion in buildings. However, it may take hours or days, even weeks to simulate airflow in a building by using CFD on a single central processing unit (CPU) computer. Parallel computing on a multi-CPU supercomputer or computer cluster can reduce the computing time, but t...

Journal: :journal of computer and robotics 0
afsaneh jalalian department of computer and communication systems engineering, faculty of engineering, universiti putra, malaysia babak karasfi faculty of computer and information technology engineering, qazvin branch, islamic azad university, qazvin, iran khairulmizam samsudin department of computer and communication systems engineering, faculty of engineering, universiti putra, malaysia m.iqbal saripan department of computer and communication systems engineering, faculty of engineering, universiti putra, malaysia syamsiah mashohor department of computer and communication systems engineering, faculty of engineering, universiti putra, malaysia

noise removal operation is commonly applied as pre-processing step before subsequent image processing tasks due to the occurrence of noise during acquisition or transmission process. a common problem in imaging systems by using cmos or ccd sensors is appearance of  the salt and pepper noise. this paper presents cellular automata (ca) framework for noise removal of distorted image by the salt an...

2012
Glenn A. Elliott Bryan C. Ward James H. Anderson

The integration of graphics processing units (GPUs) into real-time systems has recently become an active area of research. However, prior research on this topic has failed to produce real-time GPU allocation methods that fully exploit the available parallelism in GPU-enabled systems. In this paper, a GPU management framework called GPUSync is described that was designed with the goal of increas...

2014
Hyunsu Cho Peter A. Yoon

Divide-and-conquer algorithm is a numerically stable and efficient algorithm that computes the eigenvalues and eigenvectors of a symmetric tridiagonal matrix. We often face the situation where the input matrix fits into the main memory but not into the on-chip memory of a GPU device. We present an out-of-core implementation where only part of the input matrix is resident in GPU memory at any po...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید