نتایج جستجو برای: fault recovery

تعداد نتایج: 262091  

2007
William R. Dieter James E. Lumpp

ion between the user's application and the message passing primitives available on the target systems. Distributed shared memory (DSM) provides users with the abstraction of shared memory on networks of physically distributed machines. This programming model is widely considered to be more intuitive for programmers as compared to message passing languages. Because DSM systems are implemented on...

2007
Deron Liang S. C. Chou S. M. Yuan

The Object Management Architecture (OMA) has been recognized as a de facto standard in the development of object services in distributed computing environment. In a distributed system, the provision for failure-recovery is always a vital design issue. However, the fault-tolerant service has not been extensively considered in the current OMA framework, despite the fact that a increasing number o...

2004
Liberios Vokorokos

Complex system recovery by process programming redundancy This paper presents the recovery of a control system resistant against faults. We come out from parallel computer system with distributed memory and communication based upon exchange of messages. This system consists of processor elements, communication lines and switches. At least one application process is running on each of the proces...

2009
Guy Shani Christopher Meek

An automated recovery system is a key component in a large data center. Such a system typically employs a hand-made controller created by an expert. While such controllers capture many important aspects of the recovery process, they are often not systematically optimized to reduce costs such as server downtime. In this paper we describe a passive policy learning approach for improving existing ...

2008
Xuanhua Li Donald Yeung

Technology scaling has led to growing concerns about reliability in microprocessors. Currently, fault tolerance techniques rely on explicit redundant execution for fault detection or recovery which incurs significant performance, power, or hardware overhead. This paper makes the observation that value predictability is a low-cost (albeit imperfect) form of program redundancy that can be exploit...

Journal: :IEEE Trans. Computers 2000
Hagbae Kim Kang G. Shin

The Fault-Tolerance Latency (FTL) deened as the time required by all sequential steps taken to recover from an error is important to the design and evaluation of fault-tolerant computers used in safety-critical real-time control systems. To meet timing constraints or avoid dynamic failure, the latency of any fault-handling policy | that consists of several stages like error detection, fault loc...

2006
Sergiy A. Vilkomir David Lorge Parnas Veena B. Mendiratta Eamonn Murphy

This paper presents a method of estimating the availability of fault-tolerant computer systems with several recovery procedures. A segregated failures model has been proposed recently for this purpose. This paper provides further analysis and extension of this model. The segregated failures model is compared with a Markov chain model and is extended for the situation when the coverage factor is...

Journal: :CoRR 2009
B. Baykant Alagoz

Abstract: Error detectable and error correctable coding in Hamming space was researched to discover possible fault tolerant coding constellations, which can implement Boolean logic with fault tolerant property. Basic logic operators of the Boolean algebra were developed to apply fault tolerant coding in the logic circuits. It was shown that application of three-bit fault tolerant codes have pro...

Journal: :IET Software 2009
Wenbing Zhao

In this paper, we describe a novel proactive recovery scheme based on service migration for long-running Byzantine fault tolerant systems. Proactive recovery is an essential method for ensuring long term reliability of fault tolerant systems that are under continuous threats from malicious adversaries. The primary benefit of our proactive recovery scheme is a reduced vulnerability window. This ...

2007
Ahmad Abualsamid Mohamed Osama

Fault tolerance and fault recovery are integral parts of real-time systems. The literature addresses the issue of fault recovery via two main methods. One is hardware redundancy, and the other is achieved through task replication. Although lots of research has been done in this area, most of the work fell within one of the two streams, or as a combination of both. One point that did not have en...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید