coordinated checkpointing

نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092 فیلتر نتایج به سال:

A User-triggered Checkpointing Library for Computationintensive Applications

1995

Geert Deconinck Johan Vounckx Rudy Lauwereins Jean A. Peperstraete

We propose a method to incorporate coordinated checkpointing and rollback in high performance computing applications on massively parallel computers. A library allows the user to specify which data-items (including files) belong to the contents of the checkpoint, and to trigger the checkpointing in the application. The recovery-line management on the distributed disk system takes care of which ...

متن کامل

An Efficient Computing-Checkpoint Based Coordinated Checkpoint Algorithm

2006

Chaoguang Men Dongsheng Wang Yunlong Zhao

In this paper, the concept of “computing checkpoint” is introduced, and then an efficient coordinated checkpoint algorithm is proposed. The algorithm combines the two approaches of reducing the overhead associated with coordinated checkpointing, which one is to minimize the processes which take checkpoints and the other is to make the checkpointing process non-blocking. Through piggybacking the...

متن کامل

A Low-Cost Hybrid Coordinated Checkpointing Protocol for Mobile Distributed Systems

Journal: :Mobile Information Systems 2008

متن کامل

Reliability-Aware Speedup Models for Parallel Applications with Coordinated Checkpointing/Restart

Journal: :IEEE Transactions on Computers 2015

متن کامل

Review of Some Checkpointing Schemes for Distributed and Mobile Computing Environments

2015

Mr Raman Kumar Parveen Kumar

Mr Raman Kumar Mewar University, Chittorgargh (Raj) Email: [email protected] Dr Parveen Kumar Amity University Gurgaon (Haryana) Email: [email protected] ---------------------------------------------------------------------ABSTRACT------------------------------------------------------Fault Tolerance Techniques facilitate systems to carry out tasks in the incidence of faults. A checkpoint is a...

متن کامل

Three Phase Coordinated Checkpointing Scheme for Mobile Distributed Systems

2014

Monika Nagpal

In this paper, a three phase minimum-process coordinated checkpointing algorithm for nondeterministic mobile distributed systems is proposed, where no useless checkpoints are taken. An effort has been made to minimize the blocking of processes and synchronization message overhead and to capture the partial transitive dependencies during the normal execution by piggybacking dependency vectors on...

متن کامل

Using Message Semantics for Fast-Output Commit in Checkpointing-and-Rollback Recovery

1999

Luís Moura Silva João Gabriel Silva

Checkpointing is a very effective technique to ensure the continuity of long-running applications in the occurrence of failures. However, one of the handicaps of coordinated checkpointing is the high latency for committing output from the application to the external world. Enhancing the checkpointing scheme with a message logging protocol is a good solution to reduce the output latency. The ide...

متن کامل

Fine-Grained Checkpointing in Distributed Object Systems

1994

Thomas Eirich

The paper discusses problems of checkpointing in distributed object systems and presents an algorithm suited optimally to their fine-grained structure. Usually, checkpoint algorithms assume nodes or processes as system units. This assumption results in a coarse-grained structure of checkpointing. We will show that this difference in granularity makes usual checkpoint algorithms inadequate. The ...

متن کامل

A Non-blocking Checkpointing Algorithm for Distributed Systems

2011

Liu Guoliang Chen Shuyu Zhang Xiaoqin

The technology of checkpointing and rollback recovery as an effective method of fault tolerance, has been used widely on the parallel or distributed computer systems. We have presented a nonblocking coordinated checkpointing algorithm for distributed systems, which are differ from the conventional approach of taking first temporary checkpoints and then converting them to permanent ones by proce...

متن کامل

Time - Based Coordinated Checkpointing by Nuno

1998

NUNO F. NEVES Ravishankar Iyer Jane Liu Laxmikant Kale

Distributed systems are being used to support the execution of applications ranging from long-running scientific simulators to e-commerce on the Internet. In this type of environment, the failure of one of its components, either a computer or the network, may prevent other components from completing their tasks. Since the probability of failure increases with the number of computers and executi...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید