Cost - Performance of Fault Tolerance in Cloud Computing ∗

نویسندگان

  • Y. M. Teo
  • B. L. Luong
  • Y. Song
چکیده

As more computation moves into the highly dynamic and distributed cloud, applications are becoming more vulnerable to diverse failures. This paper presents a unified analytical model to study the cost-performance tradeoffs of fault tolerance in cloud applications. We compare four main checkpoint and recovery techniques, namely, coordinated checkpointing, and unco-ordinated checkpointing such as pessimistic sender-based message logging, pessimistic receiver-based message logging (PR), and optimistic receiver-based message logging (OR). We focus on how application size, checkpointing frequency, network latency, and mean time between failures influence the cost of fault tolerance, expressed in terms of percentage increase in application execution time. We further study the cost of fault tolerance in cloud applications with high probability of failure, network latency and message communication among executing processes (virtual machines). Our analysis shows that the cost of fault tolerance for both OR and PR is around 5% for the range of application size (32 to 4,096 processes or virtual machines), check-pointing frequency (one checkpoint per minute to one checkpoint per hour) and network latency (Myrinet to Internet) evaluated. For a cloud with high network latency, the cost of fault tolerance is about 5% in both OR and PR, but when failure probability is low, OR is a suitable choice and when failures are more frequent, PR is a better candidate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

A Genetic Based Resource Management Algorithm Considering Energy Efficiency in Cloud Computing Systems

Cloud computing is a result of the continuing progress made in the areas of hardware, technologies related to the Internet, distributed computing and automated management. The Increasing demand has led to an increase in services resulting in the establishment of large-scale computing and data centers, in addition to high operating costs and huge amounts of electrical power consumption. Insuffic...

متن کامل

A Framework for Evaluating Cloud Computing User’s Satisfaction in Information Technology Management

Cloud computing is a new discussion in enterprise IT. It has already become popular in terms of distributed technology in some companies. It enables managers to setup and run the intended businesses by avoiding excessive spending on computers, software and hiring expert staff, which proves to be cost effective. Cloud computing also helps users pay for the IT services without spending massive am...

متن کامل

Task Scheduling and Seedblock Based Fault Tolerance in Cloud

Cloud computing is one of the essential paradigms of concern to the data sharing resources and utilization of data computation in data centers. Resource services in cloud computing is widely used these days. Cloud computing has a main drawback of fault tolerance. Maintaining of the fault tolerance is a necessity in providing availability and reliability of critical services in cloud services. T...

متن کامل

Efficient Fault-Tolerant Strategy Selection Algorithm in Cloud Computing

Cloud computing is upcoming a mainstream feature of information technology. More progressively enterprises deploy their software systems in the cloud environment. The applications in cloud are usually large scale and containing a lot of distributed cloud components. Building cloud applications is highly reliable for challenging and critical research issues. Information processing systems has in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011