Using Virtualization to Validate Fa Ult-tolerant Distributed Systems

نویسندگان

Israel Hsu

Andrew Gallagher

Michael Le

Yuval Tamir

چکیده

Asynchronous events and complex system state distributed across independent nodes make exposure and diagnosis of flaws in distributed systems a challenge. The difficulties are exacerbated when the goal is to validate fault tolerance mechanisms that are activated only by the occurrence of errors, which are, by nature, rare. Validation of fault tolerance mechanisms is often done by injecting faults that emulate the actual faults and ‘‘stress’’ the functionality of the resilience mechanisms. Validation campaigns lasting days and involving thousands of fault injections are often necessary. We present an infrastructure that combines virtualization and software-implemented fault injection to automate validation campaigns and support the analysis of the behavior of a distributed system under test. Virtualization enables: 1) a flexible fault injector capable of emulating a wide variety of faults, and 2) a mechanism for autonomously recovering faulty nodes so that the campaign can continue running on a target system that is fully functional. As a case study we use this infrastructure to validate a Byzantine-fault-tolerant cluster manager. Over 1280 hours of fault injections yielded the exposure of 11 unique flaws in the cluster manager.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Virtualization Technologies for DTN Testbeds

At present, Internet is based on the availability of a continuous path from the source to the sink node and on limited delays. These assumptions do not hold in “challenged networks”, which comprise a wide variety of different environments, from sensor networks to space communications (including satellite systems). These networks are the preferred target of Delay/Disruption Tolerant Networking (...

متن کامل

Critical Success Factors for Data Virtualization: A Literature Review

Data Virtualization (DV) has become an important method to store and handle data cost-efficiently. However, it is unclear what kind of data and when data should be virtualized or not. We applied a design science approach in the first stage to get a state of the art of DV regarding data integration and to present a concept matrix. We extend the knowledge base with a systematic literature review ...

متن کامل

Compositional Programming and Testing of Dynamic Distributed Systems

Distributed systems are notoriously difficult to get right as they must deal with concurrency and failures. This paper proposes techniques for building reliable distributed systems with two central contributions: (1) We propose a module system based on the theory of compositional trace refinement for dynamic systems consisting of asynchronouslycommunicating state machines, where state machines ...

متن کامل

A decentralized fault tolerant control strategy for multi-robot systems

The paper presents a fault tolerance control strategy for distributed multi-robot systems. The proposed approach is based on a distributed controller-observer architecture that allows each robot to estimate the global system state using local communication. We derive residual dynamics that allows each robot to detect and isolate faults of other robots, even if they are not directly connected. T...

متن کامل

LOT: A Robust Overlay for Distributed Range Query Processing

Large-scale data-centric services are often handled by clusters of computers that include hundreds of thousands of computing nodes. However, traditional distributed query processing techniques fail to handle the large-scale distribution, peer-to-peer communication and frequent disconnection. In this paper, we introduce LOT, a robust, fault-tolerant and highly distributed overlay network for lar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Using Virtualization to Validate Fa Ult-tolerant Distributed Systems

نویسندگان

چکیده

منابع مشابه

Virtualization Technologies for DTN Testbeds

Critical Success Factors for Data Virtualization: A Literature Review

Compositional Programming and Testing of Dynamic Distributed Systems

A decentralized fault tolerant control strategy for multi-robot systems

LOT: A Robust Overlay for Distributed Range Query Processing

عنوان ژورنال:

اشتراک گذاری