نتایج جستجو برای: coordinated checkpointing
تعداد نتایج: 48092 فیلتر نتایج به سال:
We propose a method to incorporate coordinated checkpointing and rollback in high performance computing applications on massively parallel computers. A library allows the user to specify which data-items (including files) belong to the contents of the checkpoint, and to trigger the checkpointing in the application. The recovery-line management on the distributed disk system takes care of which ...
In this paper, the concept of “computing checkpoint” is introduced, and then an efficient coordinated checkpoint algorithm is proposed. The algorithm combines the two approaches of reducing the overhead associated with coordinated checkpointing, which one is to minimize the processes which take checkpoints and the other is to make the checkpointing process non-blocking. Through piggybacking the...
Mr Raman Kumar Mewar University, Chittorgargh (Raj) Email: [email protected] Dr Parveen Kumar Amity University Gurgaon (Haryana) Email: [email protected] ---------------------------------------------------------------------ABSTRACT------------------------------------------------------Fault Tolerance Techniques facilitate systems to carry out tasks in the incidence of faults. A checkpoint is a...
In this paper, a three phase minimum-process coordinated checkpointing algorithm for nondeterministic mobile distributed systems is proposed, where no useless checkpoints are taken. An effort has been made to minimize the blocking of processes and synchronization message overhead and to capture the partial transitive dependencies during the normal execution by piggybacking dependency vectors on...
Checkpointing is a very effective technique to ensure the continuity of long-running applications in the occurrence of failures. However, one of the handicaps of coordinated checkpointing is the high latency for committing output from the application to the external world. Enhancing the checkpointing scheme with a message logging protocol is a good solution to reduce the output latency. The ide...
The paper discusses problems of checkpointing in distributed object systems and presents an algorithm suited optimally to their fine-grained structure. Usually, checkpoint algorithms assume nodes or processes as system units. This assumption results in a coarse-grained structure of checkpointing. We will show that this difference in granularity makes usual checkpoint algorithms inadequate. The ...
The technology of checkpointing and rollback recovery as an effective method of fault tolerance, has been used widely on the parallel or distributed computer systems. We have presented a nonblocking coordinated checkpointing algorithm for distributed systems, which are differ from the conventional approach of taking first temporary checkpoints and then converting them to permanent ones by proce...
Distributed systems are being used to support the execution of applications ranging from long-running scientific simulators to e-commerce on the Internet. In this type of environment, the failure of one of its components, either a computer or the network, may prevent other components from completing their tasks. Since the probability of failure increases with the number of computers and executi...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید