Using Message Semantics for Fast-Output Commit in Checkpointing-and-Rollback Recovery

نویسندگان

  • Luís Moura Silva
  • João Gabriel Silva
چکیده

Checkpointing is a very effective technique to ensure the continuity of long-running applications in the occurrence of failures. However, one of the handicaps of coordinated checkpointing is the high latency for committing output from the application to the external world. Enhancing the checkpointing scheme with a message logging protocol is a good solution to reduce the output latency. The idea is to track the sources of non-determinism in order to replay the application in a reproducible way during rollback-recovery. In this paper, we will present a new eventlogging scheme that only logs those messages that may be delivered non-deterministically to the application. While other schemes keep track of the arrival order of all the messages we just save the delivery order of some of them. Our scheme exploits the semantics of message passing and is able to reduce considerably the number of receiving events when compared with other existing schemes. We will present some performance results that compare the output latency of coordinated checkpointing, pessimistic message logging, optimistic message logging and our event-logging scheme.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Manetho: Transparent Rollback-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit

Manetho is a new transparent rollback recovery protocol for long running distributed computations It uses a novel combination of antecedence graph maintenance unco ordinated checkpointing and sender based message logging Manetho simultaneously achieves the advantages of pessimistic message logging namely limited rollback and fast output commit and the advantage of optimistic message logging nam...

متن کامل

Efficient Transparent Optimistic Rollback Recovery for Distributed Application Programs

Existing rollback-recovery methods using consistent checkpointing may cause high overhead for applications that frequently send output to the “outside world,” since a new consistent checkpoint must be written before the output can be committed, whereas existing methods using optimistic message logging may cause large delays in committing output, since processes may buffer received messages arbi...

متن کامل

Implementation and Performance of Transparent Rollback-recovery in Manetho

We describe the implementation and performance of rollback-recovery in Manetho. During failure-free operation, Manetho maintains an antecedence graph which records the \happened before" relation between certain events in the distributed computation. The antecedence graph is used in combination with checkpointing and volatile sender-based message logging to simultaneously achieve low failure-fre...

متن کامل

Independent Checkpointing and Concurrent Rollback for Recovery in Distributed Systems - An Optimistic Approach

Checkpointing in a distributed system is essential for recovery to a globally consistent state after failure. In this paper, we propose a solution that benifits from the research in concurrency control, commit protocols, and site recovery algorithms. A number of checkpointing processes, a number of rollback processes, and computations on operational processes can proceed concurrently while tole...

متن کامل

Using Message Semantics to Reduce Rollback in Optimistic Message Logging Recovery Schemes

Recovery from failures can be achieved through asyn-chronous checkpointing and optimistic message logging. These schemes have low overheads during failure-free operations. Central to these protocols is the determination of a maximal consistent global state, which is recoverable. Message semantics is not exploited in most existing recovery protocols to determine the recoverable state. We propose...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999