Distributed-Memory Multiprocessors in FPGAs

نویسنده

Francisco Pires

چکیده

The exploitation of parallelism in general purpose soft-core processors has been increasingly considered an efficient approach to accelerate embedded applications. Therefore, it’s important to use standard parallel programming paradigms that facilitate the development of parallel applications, abstracting the user from architectural details. The Message Passing Interface (MPI) is a standard library to develop messagepassing programs for distributed memory processing systems. This work proposes a Message Passing Interface for FPGA soft-processors and Zynq heterogeneous systems. The work included the definition of a fully functional set of MPI functions, which has been developed as a portable C library, and the design of a set of configurable hardware components to support the communication between all the processors. Considering the specifics of the target devices, namely the resource limitations in comparison with supercomputers or clusters of workstations, the design emphasized low resource utilization as well as hardware scalability and software reliability. A set of benchmarks covering a wide range of algorithms was used to evaluate the work developed. The experimental results fully validated the implemented designs and showed that standard MPI applications can be easily ported to the target platforms. Maximum efficiencies (up to 100%) were achieved for the algorithms with lower communication overheads, such as the cpi for pi calculus. Keywords—Parallel Computing, High-Performance Computing, Embedded Systems, Soft-Processors, FPGAs, MicroBlaze, Zynq, MPI

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelization and Locality Analysis for Adaptive Computing Systems

This paper presents a strategy for compiling to adaptive computing architectures systems that incorporate configurable logic devices such as FPGAs. As compared to conventional instruction set architectures, adaptive computing systems offer the opportunity to customize the logic according to the requirements of each application. In this paper, we focus on a particular aspect of customizing the l...

متن کامل

Experiences with Data Distribution on NUMA Shared Memory Multiprocessors

The choice of a good data distribution scheme is critical to performance of data-parallel applications on both distributed memory multiprocessors and NUMA shared memory multiprocessors. The high cost of interprocessor communication in distributed memory multiprocessors makes the minimization of communications the predominant issue in selecting data distributionschemes. However, on NUMA multipro...

متن کامل

Computation and Data Partitioning on Scalable Shared Memory Multiprocessors

In this paper we identify the factors that affect the derivation of computation and data partitions on scalable shared memory multiprocessors (SSMMs). We show that these factors necessitate an SSMM-conscious approach. In addition to remote memory access, which is the sole factor on distributed memory multiprocessors, cache affinity, memory contention and false sharing are important factors that...

متن کامل

Automatic Localization for Distributed-Memory Multiprocessors Using a Shared-Memory Compilation Framework

In this paper, we outline an approach for compiling for distributed-memory multiprocessors that is inherited from compiler technologies for shared-memory multiprocessors. We believe that this approach to compiling for distributed-memory machines is promising because it is a logical extension of the shared-memory parallel programming model, a model that is easier for programmers to work with, an...

متن کامل

Scheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors 1 Scheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors

Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the eeect of task schedu...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Distributed-Memory Multiprocessors in FPGAs

نویسنده

چکیده

منابع مشابه

Parallelization and Locality Analysis for Adaptive Computing Systems

Experiences with Data Distribution on NUMA Shared Memory Multiprocessors

Computation and Data Partitioning on Scalable Shared Memory Multiprocessors

Automatic Localization for Distributed-Memory Multiprocessors Using a Shared-Memory Compilation Framework

Scheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors 1 Scheduling to Reduce Memory Coherence Overhead on Coarse-grain Multiprocessors

عنوان ژورنال:

اشتراک گذاری