JACEP2P-V2: A Fully Decentralized and Fault Tolerant Environment for Executing Parallel Iterative Asynchronous Applications on Volatile Distributed Architectures
نویسندگان
چکیده
This article presents JACEP2P-V2, a Java environment dedicated to designing parallel iterative asynchronous algorithms (with direct communications between nodes) and executing them on global computing architectures or distributed clusters composed by a large number of volatile heterogeneous distant computing nodes. This platform is fault tolerant, multi-threaded and completely decentralized. In this paper, we describe the different components of JACEP2P-V2 and the various mechanisms used for scalability and fault tolerance purposes. The performance of this improved platform is evaluated in many experiments that compare it to JACEP2P while solving, over a volatile distributed architecture, a 3D Advection-diffusion equations system. We also test the scalability of JACEP2P-V2 and its compatibility with various kinds of problems by solving a large instance of the 3D advectiondiffusion problem using more than 1000 cores and by solving the NAS parallel benchmark (GC).
منابع مشابه
MAHEVE: An Efficient Reliable Mapping of Asynchronous Iterative Applications on Volatile and Heterogeneous Environments
The asynchronous iteration model, called AIAC, has been proven to be an efficient solution for heterogeneous and distributed architectures. An efficient mapping of application tasks is essential to reduce their execution time. In this paper we present a new mapping algorithm, called MAHEVE (Mapping Algorithm for HEterogeneous and Volatile Environments) which is efficient on such architectures a...
متن کاملThe Design of Sampa
In this paper we present some of the design goals and the architecture of Sampa. Sampa stands for System for AvailabilityManagement of Process-based Applications. The goal of Sampa is to provide a high-level support for the management of distributed applications and services implemented on the top of the OSF Distributed Computing Environment (DCE). Sampa is fully decentralized and gives support...
متن کاملA Leader Election Protocol for Fault Recovery in Asynchronous Fully-Connected Networks
We introduce a new algorithm for consistent failure detection in asynchronous systems. Informally, consistent failure detection requires processes in a distributed system to distinguish between two diierent populations: a fault free population and a faulty one. The major contribution of this paper is in combining ideas from group membership and leader election, in order to have an election prot...
متن کاملReliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)
Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...
متن کاملA New Asynchronous Parallel Evolutionary Algorithm for Function Optimization
This paper introduces a new asynchronous parallel evolutionary algorithm (APEA) based on the island model for solving function optimization problems. Our fully distributed APEA overlaps the communication and computation efficiently and is inherently fault-tolerant in a large-scale distributed computing environment. For the scalable BUMP problem, our APEA algorithm achieves the best solution for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009