Distributed-Memory Multiprocessors in FPGAs
نویسنده
چکیده
The exploitation of parallelism in general purpose soft-core processors has been increasingly considered an efficient approach to accelerate embedded applications. Therefore, it’s important to use standard parallel programming paradigms that facilitate the development of parallel applications, abstracting the user from architectural details. The Message Passing Interface (MPI) is a standard library to develop messagepassing programs for distributed memory processing systems. This work proposes a Message Passing Interface for FPGA soft-processors and Zynq heterogeneous systems. The work included the definition of a fully functional set of MPI functions, which has been developed as a portable C library, and the design of a set of configurable hardware components to support the communication between all the processors. Considering the specifics of the target devices, namely the resource limitations in comparison with supercomputers or clusters of workstations, the design emphasized low resource utilization as well as hardware scalability and software reliability. A set of benchmarks covering a wide range of algorithms was used to evaluate the work developed. The experimental results fully validated the implemented designs and showed that standard MPI applications can be easily ported to the target platforms. Maximum efficiencies (up to 100%) were achieved for the algorithms with lower communication overheads, such as the cpi for pi calculus. Keywords—Parallel Computing, High-Performance Computing, Embedded Systems, Soft-Processors, FPGAs, MicroBlaze, Zynq, MPI
منابع مشابه
Parallelization and Locality Analysis for Adaptive Computing Systems
This paper presents a strategy for compiling to adaptive computing architectures systems that incorporate configurable logic devices such as FPGAs. As compared to conventional instruction set architectures, adaptive computing systems offer the opportunity to customize the logic according to the requirements of each application. In this paper, we focus on a particular aspect of customizing the l...
متن کاملExperiences with Data Distribution on NUMA Shared Memory Multiprocessors
The choice of a good data distribution scheme is critical to performance of data-parallel applications on both distributed memory multiprocessors and NUMA shared memory multiprocessors. The high cost of interprocessor communication in distributed memory multiprocessors makes the minimization of communications the predominant issue in selecting data distributionschemes. However, on NUMA multipro...
متن کاملComputation and Data Partitioning on Scalable Shared Memory Multiprocessors
In this paper we identify the factors that affect the derivation of computation and data partitions on scalable shared memory multiprocessors (SSMMs). We show that these factors necessitate an SSMM-conscious approach. In addition to remote memory access, which is the sole factor on distributed memory multiprocessors, cache affinity, memory contention and false sharing are important factors that...
متن کاملAutomatic Localization for Distributed-Memory Multiprocessors Using a Shared-Memory Compilation Framework
In this paper, we outline an approach for compiling for distributed-memory multiprocessors that is inherited from compiler technologies for shared-memory multiprocessors. We believe that this approach to compiling for distributed-memory machines is promising because it is a logical extension of the shared-memory parallel programming model, a model that is easier for programmers to work with, an...
متن کاملScheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors 1 Scheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors
Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the eeect of task schedu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015