checkpointing

نتایج جستجو برای: checkpointing

تعداد نتایج: 2665 فیلتر نتایج به سال:

A Variational Calculus Approach to Optimal Checkpoint Placement

Journal: :IEEE Trans. Computers 2001

Yibei Ling Jie Mi Xiaola Lin

ÐCheckpointing is an effective fault-tolerant technique for improving system availability and reliability. However, a blind checkpointing placement can result in either performance degradation or expensive recovery cost. By means of the calculus of variations, we derive an explicit formula that links the optimal checkpointing frequency with a general failure rate, with the objective of globally...

متن کامل

A Novel Multi-core System for Fault Tolerance and Security with Diskless Checkpointing Capability

2012

Vaithiyanathan Sundaram

Improving fault tolerance within the clusters has become vital because of the drastic decrease in the Mean Time Between Failures (MTBF) in complex clusters. Checkpointing is one of the robust ways to improve fault tolerance by rolling back from a saved state in the event of a failure in the cluster. Since, checkpointing primarily relies on storage devices for storing the states at regular inter...

متن کامل

Checkpointing algorithms and fault prediction

Journal: :Journal of Parallel and Distributed Computing 2014

متن کامل

an enhanced mss-based checkpointing scheme for mobile computing environment

Journal: :journal of computer and robotics 0

mobile computing systems are made up of different components among which mobile support stations (msss) play a key role. this paper proposes an efficient mss-based non-blocking coordinated checkpointing scheme for mobile computing environment. in the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the msss and as a result the workload of mobile ...

متن کامل

Communication Pattern Based Checkpointing Coordination for Fault-tolerant Distributed Computing Systems

2007

Taesoon Park Heon Y. Yeom

This paper presents a new checkpointing coordination scheme which utilizes the communication pattern of the cooperating processes. In the proposed scheme, the checkpointing is coordinated for the limited number of processes based on the information regarding the communication pattern of the target program. Unlike the previous solutions which do not utilize the communication pattern, it is possi...

متن کامل

Scheduling for fault-tolerance: an introduction

2015

Guillaume Aupy Yves Robert

In this chapter, we present scheduling algorithms to cope with faults on large-scale parallel platforms. We study checkpointing and show how to derive the optimal checkpointing period. Then we explain how to combine checkpointing with fault prediction, and discuss how the optimal period is modified when this combination is used. And finally we follow the very same approach for the combination o...

متن کامل

Adaptive Checkpointing

2010

Zizhong Chen

Checkpointing is a typical approach to tolerate failures in today’s supercomputing clusters and computational grids. Checkpoint data can be saved either in central stable storage, or in processor memory (as in diskless checkpointing), or local disk space (replacing memory with local disk in diskless checkpointing). But where to save the checkpoint data has a great impact on the performance of a...

متن کامل

Providing Fault-Tolerance in Unreliable Grid Systems Through Adaptive Checkpointing and Replication

2007

Maria Chtepen Filip H. A. Claeys Bart Dhoedt Filip De Turck Peter A. Vanrolleghem Piet Demeester

As grids typically consist of autonomously managed subsystems with strongly varying resources, fault-tolerance forms an important aspect of the scheduling process of applications. Two well-known techniques for providing fault-tolerance in grids are periodic task checkpointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce ...

متن کامل

Binomial Checkpointing for Arbitrary Programs with No User Annotation

Journal: :CoRR 2016

Jeffrey Mark Siskind Barak A. Pearlmutter

Heretofore, automatic checkpointing at procedure-call boundaries [1], to reduce the space complexity of reverse mode, has been provided by systems like Tapenade [2]. However, binomial checkpointing, or treeverse [3], has only been provided in AD systems in special cases, e.g., through user-provided pragmas on DO loops in Tapenade, or as the nested taping mechanism in adol-c for time integration...

متن کامل

Self-stabilizing algorithm for checkpointing in a distributed system

Journal: :J. Parallel Distrib. Comput. 2007

Partha Sarathi Mandal Krishnendu Mukhopadhyaya

If the variables used for a checkpointing algorithm have data faults, the existing checkpointing and recovery algorithms may fail. In this paper, self-stabilizing data fault detecting and correcting, checkpointing, and recovery algorithms are proposed in a ring topology. The proposed data fault detection and correction algorithms can handle data faults; at most one per process, but in any numbe...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید