Tardigrade: Leveraging Lightweight Virtual Machines to Easily and Efficiently Construct Fault-Tolerant Services

نویسندگان

  • Jacob R. Lorch
  • Andrew Baumann
  • Lisa Glendenning
  • Dutch T. Meyer
  • Andrew Warfield
چکیده

Many services need to survive machine failures, but designing and deploying fault-tolerant services can be difficult and error-prone. In this work, we present Tardigrade, a system that deploys an existing, unmodified binary as a fault-tolerant service. Tardigrade replicates the service on several machines so that it continues running even when some of them fail. Yet, it keeps the service states synchronized so clients see strongly consistent results. To achieve this efficiently, we use lightweight virtual machine replication. A lightweight virtual machine is a process sandboxed so that its external dependencies are completely encapsulated, enabling it to be migrated across machines. To let unmodified binaries run within such a sandbox, the sandbox also contains a library OS providing the expected API. We evaluate Tardigrade’s performance and demonstrate its applicability to a variety of services, showing that it can convert these services into fault-tolerant ones transparently and efficiently.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Progress towards Petascale Virtual Machines

Petascale Virtual Machines (PVM) continues to be a popular software package both for creating personal grids and for building adaptable, fault tolerant applications. We will illustrate this by describing a computational biology environment built on top of PVM that is used by researchers around the world. We will then describe or recent progress in building an even more adaptable distributed vir...

متن کامل

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

The very large infrastructure and the increasing demand of services of cloud computing systems lead to the need of an effective fault tolerant allocation technique. In this paper, we address the problem of allocating user applications to the virtual machines of cloud computing systems so that failures can be avoided in the presence of faults. We employ job replication as an effective mechanism ...

متن کامل

Affinity-aware modeling of CPU usage with communicating virtual machines

Use of virtualization in Infrastructure as a Service (IaaS) environments provides benefits to both users and providers: users can make use of resources following a pay-per-use model and negotiate performance guarantees, whereas providers can provide quick, scalable and hardware-fault tolerant service and also utilize resources efficiently and economically. With increased acceptance of virtualiz...

متن کامل

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...

متن کامل

A Lightweight Approach Multiplexing Resource Allocation Scheme by Virtualization based on Time Series in SOC

By leveraging virtual machine (VM) technology which provides performance and fault isolation, Cloud resources can be provisioned on demand in a fine-grained, multiplexed manner rather than in monolithic pieces. By integrating volunteer computing into Cloud architectures, we envision a gigantic Self-Organizing Cloud (SOC) being formed to reap the huge potential of untapped commodity computing po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015