A Cost-Effective Strategy for Storing Scientific Datasets with Multiple Service Providers in the Cloud

نویسندگان

  • Dong Yuan
  • Li-zhen Cui
  • Xiao Liu
  • Erjiang Fu
  • Yun Yang
چکیده

Cloud computing provides scientists a platform that can deploy computation and data intensive applications without infrastructure investment. With excessive cloud resources and a decision support system, large generated datasets can be flexibly 1) stored locally in the current cloud, 2) deleted and regenerated whenever reused or 3) transferred to cheaper cloud service for storage. However, due to the pay-as-you-go model, the total application cost largely depends on the usage of computation, storage and bandwidth resources, hence cutting the cost of cloud-based data storage becomes a big concern for deploying scientific applications in the cloud. In this paper, we propose a novel strategy that can costeffectively store large generated datasets with multiple cloud service providers. The strategy is based on a novel algorithm that finds the trade-off among computation, storage and bandwidth costs in the cloud, which are three key factors for the cost of data storage. Both general (random) simulations conducted with popular cloud service providers’ pricing models and three specific case studies on real world scientific applications show that the proposed storage strategy is highly cost effective and practical for runtime utilisation in the cloud. 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and Prioritization of Factors Contributing in Cloud Service Selection Using Fuzzy Best-worst Method (FBWM)

The introduction of cloud computing techniques revolutionized the current of information processing and storing. Cloud computing as a competitive edge provides easy and automated access to the vast ocean of resources through standard network mechanisms to businesses and organizations. Due to the vast diversity of service providers and their respective variety of available services with differen...

متن کامل

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...

متن کامل

A service decomposition and definition model in cloud manufacturing systems using game theory focusing on cost accounting perspectives

Cloud manufacturing is a new paradigm which has been under study since 2010 and a vast body of research has been conducted on this topic. Among them, service composition problems are of utmost importance. However, most studies only focused on private clouds meaning the objective function is defined for just one component of the supply chain. This paper attempts to consider service composition p...

متن کامل

On-demand Minimum Cost Benchmarking for Intermediate Datasets Storage in Scientific Cloud Workflow Systems

Many scientific workflows are data intensive where a large volume of intermediate datasets is generated during their execution. Some valuable intermediate datasets need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on cloud has become popular nowadays, more intermediate datasets in sc...

متن کامل

On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems

Many scientific workflows are data intensive: large volumes of intermediate datasets are generated during their execution. Some valuable intermediate datasets need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science on clouds has become popular nowadays, more intermediate datasets in scient...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1601.07028  شماره 

صفحات  -

تاریخ انتشار 2016