An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems
نویسندگان
چکیده
The MapReduce and Hadoop frameworks were designed to support efficient large scale computations. There has been growing interest in employing Hadoop clusters for various diverse applications. A large number of (heterogeneous) clients, using the same Hadoop cluster, can result in tensions between the various performance metrics by which such systems are measured. On the one hand, from the service provider side, the utilization of the Hadoop cluster will increase. On the other hand, from the client perspective the parallelism in the system may decrease (with a corresponding degradation in metrics such as mean completion time). An efficient scheduling algorithm should strike a balance between utilization and parallelism in the cluster to address performance metrics such as fairness and mean completion time. In this paper, we propose a new Hadoop cluster scheduling algorithm, which uses system information such as estimated job arrival rates and mean job execution times to make scheduling decisions. The objective of our algorithm is to improve mean completion time of submitted jobs. In addition to addressing this concern, our algorithm proCopyright c © 2011 Aysan Rasooli and Dr. Douglas G. Down. Permission to copy is hereby granted provided the original copyright notice is reproduced in copies made. vides competitive performance under fairness and locality metrics (with respect to other wellknown Hadoop scheduling algorithms Fair Sharing and FIFO). This approach can be efficiently applied in heterogeneous clusters, in contrast to most Hadoop cluster scheduling algorithm work, which assumes homogeneous clusters. Using simulation, we demonstrate that our algorithm is a very promising candidate for deployment in real systems.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملAn Efficient Genetic Algorithm for Task Scheduling on Heterogeneous Computing Systems Based on TRIZ
An efficient assignment and scheduling of tasks is one of the key elements in effective utilization of heterogeneous multiprocessor systems. The task scheduling problem has been proven to be NP-hard is the reason why we used meta-heuristic methods for finding a suboptimal schedule. In this paper we proposed a new approach using TRIZ (specially 40 inventive principles). The basic idea of thi...
متن کاملA Review on Storage and Task Scheduling in Heterogeneous Hadoop Clusters
The task scheduling algorithm for homogeneous Hadoop clusters is incapable of proper utilization of resources in heterogeneous clusters. To overcome this issue, an adaptive task scheduling algorithm has been proposed. With adaptive task scheduling we aim for better resource utilization by dynamically adjusting the workload at runtime. Also we are making the storage of data resource aware so tha...
متن کاملAn Efficient Genetic Algorithm for Task Scheduling on Heterogeneous Computing Systems Based on TRIZ
An efficient assignment and scheduling of tasks is one of the key elements in effective utilization of heterogeneous multiprocessor systems. The task scheduling problem has been proven to be NP-hard is the reason why we used meta-heuristic methods for finding a suboptimal schedule. In this paper we proposed a new approach using TRIZ (specially 40 inventive principles). The basic idea of thi...
متن کاملAn adaptive modified firefly algorithm to unit commitment problem for large-scale power systems
Unit commitment (UC) problem tries to schedule output power of generation units to meet the system demand for the next several hours at minimum cost. UC adds a time dimension to the economic dispatch problem with the additional choice of turning generators to be on or off. In this paper, in order to improve both the exploitation and exploration abilities of the firefly algorithm (FA), a new mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011