Reset-Driven Fault Tolerance

نویسندگان

  • João Carlos Cunha
  • António Correia
  • Jorge Henriques
  • Mário Zenha Rela
  • João Gabriel Silva
چکیده

A common approach in embedded systems to achieve fault-tolerance is to reboot the computer whenever some non-permanent error is detected. All the system code and data are recreated from scratch, and a previously established checkpoint, hopefully not corrupted, is used to restart the application data. The confidence is thus restored on the activity of the computer. The idea explored in this paper is that of unconditionally resetting the computer in each control frame (the classic read sensors → calculate control action → update actuators cycle). A stable-storage based in RAM is used to preserve the system’s state between consecutive cleanups and a standard watchdog timer guarantees that a reset is forced whenever an error crashes the system. We have evaluated this approach by using fault-injection in the controller of a standard temperature control system. The experimental observations show that the Reset–Driven Fault Tolerance is a very simple yet effective technique to improve reliability at an extremely low cost since it is a conceptually simple, software only solution with the advantage of being application

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Fault Tolerance

Abst rac t . As communication networks grow, existing fault handling tools become increasingly unaffordable. In many cases the reason is that they involve global measures such as global time-outs or reset procedures, and their cost grows with the size of the network. Rather, for a fault handling mechanism to scale to large networks, it should involve local measures, or, at worse, fault local me...

متن کامل

A CAN-Based Architecture for Highly Reliable Communication Systems

In many application areas of distributed systems based on serial busses like CAN high safety and reliability are considered as major functional requirements. In addition, the communication system has to cope with periodic as well as event-driven messages, which have to be transferred under hard real-time constraints. Especially where a considerable amount of event-driven data occurs, a flexible...

متن کامل

On Potential Fault Detection in Sequential CircuitsElizabeth

During fault simulation, an approximation frequently used in practice is to declare a fault to be detected after it has been potentially detected a predetermined number of times. This approximation may lead to declaring a fault detected when in fact the fault will not be detected during a standard test application process. We propose an alternative measure of fault detection for potentially det...

متن کامل

Real-Time Fault-Tolerant Atomic Broadcast

We present algorithms for Real-Time Fault-Tolerance Uniform Atomic Broadcast developed in the framework of the French project ATR (accord temps réel). We first design a distributed execution model for asynchronous systems with crash failure we called Synchronized Phase System (SPS), then we give an algorithm for Atomic Broadcast in SPS. In a SPS, the processes try to run in synchronized rounds ...

متن کامل

Low-Cost Flexible Software Fault Tolerance for Distributed Computing

In this paper, we revisit the problem of software fault tolerance in distributed systems. In particular, we propose an extension of a message-driven confidence-driven (MDCD) protocol we have developed for error containment and recovery in a particular type of distributed embedded system. More specifically, we augment the original MDCD protocol by introducing the method of “finegrained confidenc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002