نتایج جستجو برای: checkpointing

تعداد نتایج: 2665  

Journal: :CoRR 2017
Ismail Akturk Ulya R. Karpuzcu

Systematic checkpointing of the machine state makes restart of execution from a safe state possible upon detection of an error. The time and energy overhead of checkpointing, however, grows with the frequency of checkpointing. Amortizing this overhead becomes especially challenging, considering the growth of expected error rates, as checkpointing frequency tends to increase with increasing erro...

2014
Manoj Kumar

Checkpoint and recovery protocols are commonly used in distributed applications for providing fault tolerance. A distributed system may require taking checkpoints from time to time to keep it free of arbitrary failures. In case of failure, the system will rollback to checkpoints where global consistency is preserved. Checkpointing is one of the fault-tolerant techniques to restore faults and to...

Journal: :MONET 2003
Chi-Yi Lin Szu-Chi Wang Sy-Yen Kuo

Time-based coordinated checkpointing protocols are well suited for mobile computing systems because no explicit coordination message is needed while the advantages of coordinated checkpointing are kept. However, without coordination, every process has to take a checkpoint during a checkpointing process. In this paper, an efficient time-based coordinated checkpointing protocol for mobile computi...

2010
Hui Jin

Checkpointing is a mostly used mechanism for supporting fault tolerance of high performance computing (HPC), but notorious in its expensive disk access. Parallel file systems such as Lustre, GPFS, PVFS are widely deployed on super computers to provide fast I/O bandwidth for general data-intensive applications. However, the unique feature of checkpointing makes it impossible to benefit from the ...

2000
S. K. Woo M. H. Kim Y. J. Lee

In main memory databases, fuzzy checkpointing gives less transaction overhead due to its asynchronous backup feature. However, till now, fuzzy checkpointing has considered only physical logging schemes. The size of the physical log records is very large, and hence it incurs space and recovery processing overhead. In this paper, we propose a recovery method based on a hybrid logging scheme, whic...

Journal: :J. Parallel Distrib. Comput. 2006
Partha Sarathi Mandal Krishnendu Mukhopadhyaya

Several schemes for checkpointing and rollback recovery have been reported in the literature. In this paper, we analyze some of these schemes under a stochastic model. We have derived expressions for average cost of checkpointing, rollback recovery, message logging and piggybacking with application messages in synchronous as well as asynchronous checkpointing. For quasi-synchronous checkpointin...

1996
Jiandong Huang Peng-Jun Wan Vicraj Thomas

This study investigates real-time checkpointing techniques in the context of distributed process control applications where checkpointing and recovery operations must meet timing constraints, such as process deadline and plant state validity. We introduce the notion of quasidurability, which allows one to rnake trudeoffs between storage device reliability and the process control and recovery ti...

Journal: :J. Parallel Distrib. Comput. 2001
James S. Plank Michael G. Thomason

Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. While the literature abounds with performance models of checkpointing systems, none address the issue of selecting runtime parameters other than the optimal checkpointing interval. In particular, the issue of processor allocation is typically ignored. In this paper, we present a performa...

2006
John Paul Walters Vipin Chaudhary

In its simplest form, checkpointing is the act of saving a program’s computation state in a form external to the running program, e.g. the computation state is saved to a filesystem. The checkpoint files can then be used to resume computation upon failure of the original process(s), hopefully with minimal loss of computing work. A checkpoint can be taken using a variety of techniques in every l...

2000
Francesco QUAGLIA Bruno CICIANI Roberto BALDONI

Many communication induced checkpointing algorithms have been proposed for asynchronous cooperating processes. All of them suffer from overhead due both to the exchange of control information and to the insertion of local checkpoints additional to the basic ones. In this paper we propose a low overhead checkpointing-recovery scheme. It consists of a domino-free checkpointing algorithm plus an a...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید