نتایج جستجو برای: coordinated checkpointing
تعداد نتایج: 48092 فیلتر نتایج به سال:
As more computation moves into the highly dynamic and distributed cloud, applications are becoming more vulnerable to diverse failures. This paper presents a unified analytical model to study the cost-performance tradeoffs of fault tolerance in cloud applications. We compare four main checkpoint and recovery techniques, namely, coordinated checkpointing, and unco-ordinated checkpointing such as...
A message is in-transit with respect to a global state if its sending is recorded in this global state, while its receipt is not. Checkpointing algorithms have to log such in-transit messages in order to restore the state of channels when a computation has to be resumed from a consistent global state after a failure has occurred. Coordinated checkpointing algorithms log those in-transit message...
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
Mobile Distributed systems (MDSs) are made up of Mobile host (MH), Base Station (BS) and Mobile Support Station (MSS). Among which MSSs play a key role in mobile environment. This paper presents a low overhead Proxy MSS based framework to handle the fault in mobile distributed Systems. In the proposed scheme, one MSS lot of proxy MSS works as per the workload. One proxy MSS handles the specific...
This paper presents a Checkpoint-based Rollback Recovery and Migration System for Message Passing Interface, ChaRM4MPI, for Linux Clusters. Some important fault tolerant mechanisms are designed and implemented in this system, which include coordinated checkpointing protocol, synchronized rollback recovery, process migration, and so on. Owing to ChaRM4MPI, the node transient faults can be recove...
In large scale parallel systems, storing memory images with checkpointing will involve massive amounts of concentrated I/O from many nodes, resulting in considerable execution overhead. For user-level checkpointing, overhead reduction usually involves both spatial, i.e., reducing the amount of checkpoint data, and temporal, i.e., spreading out I/O by checkpointing data as soon as their values b...
Distributed coordinated checkpointing algorithms are discussed. The first global checkpoint for a checkpoint initiation is a set containing the checkpoint for each process in which any checkpoint before the element is not consistent with the initiation. The last global checkpoint for a checkpoint initiation is a set containing the checkpoint for each process in which any checkpoint after the el...
The well-known coordinated snapshot algorithm of mutable checkpointing [7–9] is studied. We equip it with a concise formal model and analyze its operational behavior via an invariant characterizing the snapshot computation. By this we obtain a clear understanding of the intermediate behavior and a correctness proof of the final snapshot based on a strong notion of consistency (reachability with...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید