Distr ibuted System
نویسنده
چکیده
Fault tolerance can allow processes executing in a computer system to survive failures within the system This thesis addresses the theory and practice of transparent fault tolerance methods using message logging and checkpointing in distributed systems A general model for reasoning about the behavior and correctness of these methods is developed and the design implementation and performance of two new low overhead methods based on this model are presented No specialized hardware is required with these new methods The model is independent of the protocols used in the system Each process state is represented by a dependency vector and each system state is represented by a dependency matrix showing a collection of process states The set of system states that have occurred during any single execution of a system forms a lattice with the sets of consistent and recoverable system states as sublattices There is thus always a unique maximum recoverable system state The rst method presented uses a new pessimistic message logging protocol called sender based message logging Each message is logged in the local volatile memory of the machine from which it was sent and the order in which the message was received is returned to the sender as a receive sequence number Message logging overlaps execution of the receiver until the receiver attempts to send a new message Implemented in the V System the maximum measured failure free overhead on dis tributed application programs was under percent and average overhead measured percent or less depending on problem size and communication intensity Optimistic message logging can outperform pessimistic logging since message log ging occurs asynchronously A new optimistic message logging system is presented that guarantees to nd the maximum possible recoverable system state which is not ensured by previous optimistic methods All logged messages and checkpoints are utilized and thus some messages received by a process before it was checkpointed may not need to be logged Although failure recovery using optimistic message log ging is more di cult failure free application overhead using this method ranged from only a maximum of under percent to much less than percent
منابع مشابه
Comportamiento Autónomo del Holón Recurso basado en la Agenda de Producción
The manufactur ing systems a r e unpr ed ictab le, distr ibuted and highly dynamic, which demands the cont rol architecture fl exibility, autonomous decisionmaking capability and fast adaptation in the presence of disturbances that may be in the system. The Holonic and MultiAgent par adigms have shown to be suitable for the design and modeling of control architectures and the...
متن کاملA New Algorithm to Implement Causal Ordering
This paper presents a new algorithm to implement causal ordering. Causal ordering was first proposed in the ISIS system developed at Cornell University. The interest of causal ordering in a distr ibuted system is that it is cheaper to realize than total ordering. The implementation of causal ordering proposed in this paper uses logical clocks of Mat te rn-Fidge (which define a partial order bet...
متن کاملModeling of Hierarchical Distributed Systems with Fault-Tolerance
Absfracf-This paper addresses some fault-tolerant issues pertaining to hierarchically distr ibuted systems. Since each o f the levels in a hierarchical system could have various characteristics, different faulttolerance schemes could he appropriate at different levels. I n this paper, we use stochastic Pet r i nets (SPN's) to investigate various faulttolerant schemes in this context. The basic ...
متن کاملProfiling Communication in Distributed Genetic Algorithms
To what extent is distr ibution beneficial to the search quali ty and computational resources used by a genetic algori thm execution? Most distr ibuted genetic algorithms rely on communicating genetic informat ion, in the form of individual solutions, between concurrently evolving populations. Another way of effectively using the additional information generated by the parallel executions is th...
متن کاملAn Algorithm for Understanding of Color Vision
depth and shape in low level visual processing. Baaed on the copatational theory and Para1 lel Distr ibuted Processing theory, a parallel algorithm for realizing subjective color vision(SCV) is presented in this paper. The paper contains following sections: the computational theory for low level color vision is mentioned at first section; Then PDP algorithm of color vision is described. Finally...
متن کاملAutomatic Data Decomposit ion for Message-Passing Machines
1 I n t r o d u c t i o n Distributed-memory message-passing computers are becoming more common these days because they offer significant advantages over shared-memory machines in terms of cost and scalability. However, distr ibuted-memory machines are more difficult to program than shared-memory machines because programmers of distributed-memory machines have to manage low-level tasks like dis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1989