Parallel Fourier Transformations using shared memory nodes
نویسنده
چکیده
The Fast Fourier Transform (FFT) is of great importance for various scientific applications used in High Performance Computing (HPC). However, a detailed performance analysis shows that the FFT routines used in these applications, prevent them from scaling to large processor counts. The All-to-All type communication required inside these transformation routines, which becomes extremely costly when large processor counts are involved, seems to be the limiting factor. In the scope of this dissertation, we mainly focus on whether and how the performance of the parallel two-dimensional (2D) FFT can be improved, by exploiting the access to the shared memory nodes of HPCx, a cluster of POWER 5 SMP nodes. In particular, we investigate how to efficiently transfer the data between the processing elements involved in the parallel 2D FFT. Different OpenMP strategies are proposed for the parallelisation of the 2D FFT. The results demonstrate that, for certain problem sizes between 16 and 8192, the access to the shared memory of an HPCx node (16 processors) can produce gains in performance compare to the MPI implementation. In addition, for large processors counts, we use our results from the 2D case to optimise the parallelisation of the three-dimensional (3D) FFT with the Hybrid, a mixed mode programming model between shared memory programming and messaging passing. In our implementation, we use the Master-only style, a version of the Hybrid model, where the MPI communication is handled only by the master thread, outside the OpenMP parallel regions. The results demonstrate a good scaling of the code for problem sizes between 64 and 512 up to 1024 processors. The performance comparisons illustrate that, in certain cases, the Hybrid model can prove beneficial compare to the 2D data decomposition with pure MPI. Subject area: High Performance Computing
منابع مشابه
Multigrain Shared Memory Multigrain Shared Memory
Parallel workstations, each comprising a 10-100 processor shared memory machine, promise cost-e ective general-purpose multiprocessing. This thesis explores the coupling of such smallto medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. Multiprocessors built in this fashion are called Distributed Scalable Shared memo...
متن کامل"Slow Is Fast" for Wireless Sensor Networks in the Presence of Message Losses
Transformations from shared memory model to wireless sensor networks (WSNs) quickly become inefficient in the presence of prevalent message losses in WSNs, and this prohibits their wider adoption. To address this problem, we propose a variation of the shared memory model, the SF shared memory model, where the actions of each node are partitioned into slow actions and fast actions. The tradition...
متن کاملShaman: A Distributed Simulator for Shared Memory Multiprocessors
This paper describes our distributed architectural simulator of shared memory multiprocessors named Shaman. The simulator runs on a PC cluster that consists of multiple front-end nodes to simulate the instruction level behavior of a target multiprocessor in parallel and a back-end node to simulate the target memory system. The front-end also simulates the logical behavior of the shared memory u...
متن کاملCompiling MPI for Many-Core Systems
Processors with multiple (or many) cores and shared memory are becoming ubiquitous across the computing spectrum. MPI, the current de facto programming model for scalable parallel applications, enforces copies between source and target processes and thus can not fully utilize shared memory and cache architectures of modern machines. To enable MPI-based programs to more fully exploit features of...
متن کاملParallel Fourier-Motzkin Elimination
Fourier{Motzkin elimination is a computationally expensive but powerful method to solve a system of linear inequalities for real and integer solution spaces. Because it yields an explicit representation of the solution set, in contrast to other methods such as Simplex, one may, in some cases, take its longer run time into account. We show in this paper that it is possible to considerably speed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008