Data Placement in Widely Distributed Systems

نویسنده

  • Tevfik Kosar
چکیده

The unbounded increase in the computation and data requirements of scientific applications has necessitated the use of widely distributed compute and storage resources to meet the demand. In such an environment, data is no more locally accessible and has thus to be remotely retrieved and stored. Efficient and reliable access to data sources and archiving destinations in a widely distributed environment brings new challenges. Placing data on temporary local storage devices offers many advantages, but such “data placements” also require careful management of storage resources and data movement, i.e. allocating storage space, staging-in of input data, staging-out of generated data, and de-allocation of local storage after the data is safely stored at the destination. Existing systems closely couple data placement and computation, and consider data placement as a side effect of computation. Data placement is either embedded in the computation and causes the computation to delay, or performed as simple scripts which do not have the privileges of a job. In this dissertation, we propose a framework that de-couples computation and data placement, allows asynchronous execution of each, and treats data placement as a full-fledged job that can be queued, scheduled, monitored and check-pointed like computational jobs. We regard data placement as an important part of the end-to-end process, and express this in a workflow language. As data placement jobs have different semantics and different characteristics than computational jobs, not all traditional techniques applied to computational jobs apply to data placement jobs. We analyze different scheduling strategies for data placement, and introduce a batch scheduler specialized for data placement. This scheduler implements techniques speii cific to queuing, scheduling, and optimization of data placement jobs, and provides a level of abstraction between the user applications and the underlying data transfer and storage resources. We provide a complete data placement subsystem for distributed computing systems, similar to I/O subsystem in operating systems. This system offers transparent failure handling, reliable, efficient scheduling of data resources, load balancing on the storage servers, and traffic control on network links. It provides policy support, improves fault-tolerance and enables higher-level optimizations including maximizing the application throughput. Through deployment in several real-life applications such as US-CMS, DPOSS Astronomy Pipeline, and WCER Educational Video Pipeline, our approach has proved to be effective, providing a promising new research direction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi Objective Optimization Placement of DG Problem for Different Load Levels on Distribution Systems with Purpose Reduction Loss, Cost and Improving Voltage Profile Based on DAPSO Algorithm

Along with economic growth of countries which leads to their increased energy requirements,the problem of power quality and reliability of the networks have been more considered andin recent decades, we witnessed a noticeable growing trend of distributed generation sources(DG) in distribution networks. Occurrence of DG in distribution systems, in addition tochanging the utilization of these sys...

متن کامل

Reconfiguration and optimal placement of distributed generations in distribution networks in the presence of remote voltage controlled bus using exchange market algorithm

Abstract: Since distribution networks have a large share of the losses in power systems, reducing losses in these networks is one of the key issues in reducing the costs of global networks, including issues Which has always been considered. In this thesis, the reconfiguration of the distribution network in the presence of distributed generation sources (DGs) with respect to two types of bus, P ...

متن کامل

Optimal Placement of DGs in Distribution System including Different Load Models for Loss Reduction using Genetic Algorithm

Distributed generation (DG) sources are becoming more prominent in distribution systems due to the incremental demands for electrical energy. Locations and capacities of DG sources have great impacts on the system losses in a distribution network. This paper presents a study aimed for optimally determining the size and location of distributed generation units in distribution systems with differ...

متن کامل

Data Scheduling for Large Scale Distributed Applications

Current large scale distributed applications studied by large research communities result in new challenging problems in widely distributed environments. Especially, scientific experiments using geographically separated and heterogeneous resources necessitated transparently accessing distributed data and analyzing huge collection of information. We focus on data-intensive distributed computing ...

متن کامل

Correlated Data Placement in Distributed Systems

In distributed systems, communication cost is one of the major concerns. Many existing research have been conducted on placement of data to reduce communication cost and improve performance in widely distributed systems. All these research works focus on independent data objects. However, data are correlated due to accesses from clients and the correlation has some impact on date placement. In ...

متن کامل

Optimal Placement of DGs in Distribution System including Different Load Models for Loss Reduction using Genetic Algorithm

Distributed generation (DG) sources are becoming more prominent in distribution systems due to the incremental demands for electrical energy. Locations and capacities of DG sources have great impacts on the system losses in a distribution network. This paper presents a study aimed for optimally determining the size and location of distributed generation units in distribution systems with differ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005