Scheduling MapReduce Jobs Under Multi-round Precedences

نویسندگان

  • Dimitris Fotakis
  • Ioannis Milis
  • Orestis Papadigenopoulos
  • Vasilis Vassalos
  • Georgios Zois
چکیده

We consider non-preemptive scheduling of MapReduce jobs with multiple tasks in the practical scenario where each job requires several map-reduce rounds. We seek to minimize the average weighted completion time and consider scheduling on identical and unrelated parallel processors. For identical processors, we present LP-based O(1)approximation algorithms. For unrelated processors, the approximation ratio naturally depends on the maximum number of rounds of any job. Since the number of rounds per job in typical MapReduce algorithms is a small constant, our scheduling algorithms achieve a small approximation ratio in practice. For the single-round case, we substantially improve on previously best known approximation guarantees for both identical and unrelated processors. Moreover, we conduct an experimental analysis and compare the performance of our algorithms against a fast heuristic and a lower bound on the optimal solution, thus demonstrating their promising practical performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Throughput Driven Task Scheduler for Batch Jobs in Shared MapReduce Environments

MapReduce is one of the most popular parallel data processing systems, and it has been widely used in many fields. As one of the most important techniques in MapReduce, task scheduling strategy is directly related to the system performance. However, in multi-user shared MapReduce environments, the existing task scheduling algorithms cannot provide high system throughput when processing batch jo...

متن کامل

PISCES: Optimizing Multi-job Application Execution in MapReduce

Nowadays, many MapReduce applications consist of groups of jobs with dependencies among each other, such as iterative machine learning applications and large database queries. Unfortunately, the MapReduce framework is not optimized for these multi-job applications. It does not explore the execution overlapping opportunities among jobs and can only schedule jobs independently. These issues signi...

متن کامل

Scalable Clustering using MapReduce Programming Model

The aim is to implement a clustering algorithm, which will run in a distributed computing environment for which, a multi-node Hadoop cluster providing support for the Hadoop Distributed File System and the MapReduce Programming Model has been set up. In this paper, Exclusive and Complete Clustering (ExCC), a grid based algorithm, is implemented by scheduling consecutive MapReduce Jobs, for mass...

متن کامل

Real-Time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

Supporting real-time jobs on MapReduce systems is particularly challenging due to the heterogeneity of the environment, the load imbalance caused by skewed data blocks, as well as real-time response demands imposed by the applications. In this paper we describe our approach for scheduling real-time, skewed MapReduce jobs in heterogeneous systems. Our approach comprises the following components:...

متن کامل

Queuing Network Models to Predict the Completion Time of the Map Phase of MapReduce Jobs

Big Data processing is generally defined as a situation when the size of the data itself becomes part of the computational problem. This has made divide-and-conquer type algorithms implemented in clusters of multi-core CPUs in Hadoop/MapReduce environments an important data processing tool for many organizations. Jobs of various kinds, which consists of a number of automatically parallelized ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016