Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism
نویسندگان
چکیده
منابع مشابه
Multi-level parallelism for incompressible flow computations on GPU clusters
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-g...
متن کاملCluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters
SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. SkePU provides programmability,...
متن کاملA flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters
Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing...
متن کاملMulti-GPU acceleration of direct pore-scale modeling of fluid flow in natural porous media
Modified Moving Particle Semi-implicit (MMPS) is a particle-based method used to simulate pore-scale fluid flow through disordered porous media. We present a multi-GPU implementation of MMPS for hybrid CPU–GPU clusters using NVIDIA’s Compute Unified Device Architecture (CUDA). Message Passing Interface (MPI) functions are used to communicate between different nodes of the cluster and hence thei...
متن کاملEnabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous multicore and multi-GPU systems to support dense matrix computations efficiently. The main idea is that we treat a heterogeneous system as a distributedmemory machine, and use a heterogeneous multi-level block cyclic distribution method to allocate data to the host and multiple GPUs to minimize communication. We ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015