نتایج جستجو برای: checkpointing

تعداد نتایج: 2665  

2003
Xinyu Chen Michael R. Lyu

Many mathematical models have been proposed to evaluate the execution performance of an application with and without checkpointing in the presence of failures. They assume that the total program execution time without failure is known in advance, under which condition the optimal checkpointing interval can be determined. In mobile environments, application components are distributed and tasks a...

2000
Angkul Kongmunvattana Santipong Tanchatchawal Nian-Feng Tzeng

Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel computing environments on clusters of workstations. In this paper, we propose a new, efficient coordinated checkpointing technique, called coherence-based coordinated checkpointing (CCC), for SDSM. Our CCC minimizes both th...

Journal: :IJNC 2014
Jack J. Dongarra Thomas Hérault Yves Robert

Fast checkpointing algorithms require distributed access to stable storage. This paper revisits the approach based upon double checkpointing, and compares the blocking algorithm of Zheng, Shi and Kalé [23], with the non-blocking algorithm of Ni, Meneses and Kalé [15] in terms of both performance and risk. We also extend the model proposedcan provide a better efficiency in [23, 15] to assess the...

Journal: :CoRR 2007
Jason Ansel Michael Rieker Gene Cooperman

We present a preliminary description of a user-level checkpointing package, DMTCP, for Linux. The socket-based approach presents a novel method for checkpointing distributed processes. This includes checkpointing of any dynamically created POSIX threads and forked child processes. It also includes checkpointing of remotely spawned processes via ssh and other mechanisms. As with all user-level c...

Journal: :IJDST 2011
Parveen Kumar Rachit Garg

Minimum-process coordinated checkpointing is a suitable approach to introduce fault tolerance in mobile distributed systems transparently. In order to balance the checkpointing overhead and the loss of computation on recovery, the authors propose a hybrid checkpointing algorithm, wherein an all-process coordinated checkpoint is taken after the execution of minimum-process coordinated checkpoint...

2007
Jörg Preißinger Mark Pflüger

There exist mainly three different approaches of checkpoint-based recovery mechanisms for distributed systems: coordinated checkpointing, uncoordinated checkpointing and communication induced checkpointing. It can be shown that communication induced checkpointing theoretically has the least minimum overhead, but also that the effective overhead depends on the communication behaviour and the res...

2005
Thomas Huining Feng

In this project, incremental checkpointing is developed specifically for Java programs. This checkpointing scheme has a flavor of source code refactoring, which performs almost all the (rule-based) transformation automatically, requiring few (or no in many cases) interaction with the programmer. Incremental checkpointing bases on a logging technique that records the change in states instead of ...

1998
James S. Plank Michael G. Thomason

Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. This paper makes three small contributions to this research area. First, we show how to apply the concept of availability from reliability theory as a useful metric for checkpointing systems. Second, we study the average availability of uniprocessor checkpointing systems, using the libck...

2008
Ch. D. V. Subba Rao M. M. Naidu

⎯ Checkpointing schemes facilitate fault recovery in distributed systems. The two-level fault recovery scheme of distributed system inherits the merits of both disk-based and diskless checkpointing schemes. The present work extends James S Plank’s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ to checkpoint programs with high locality of reference. This mechanism enables ap...

2010
Subba Rao Sai Krishna

Checkpointing and message logging are the popular and generalpurpose tools for providing fault tolerance in distributed systems. Diskless checkpointing schemes enable frequent checkpointing without a performance penalty. The present work extends James S Plank‟s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ mechanism to checkpoint programs with high locality of reference. T...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید