On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time

نویسندگان

  • Nikzad Babaii Rizvandi
  • Albert Y. Zomaya
  • Ali Javadzadeh Boloori
  • Javid Taheri
چکیده

In this paper, we propose an analytical method to model the dependency between configuration parameters and total execution time of Map-Reduce applications. Our approach has three key phases: profiling, modeling, and prediction. In profiling, an application is run several times with different sets of MapReduce configuration parameters to profile the execution time of the application on a given platform. Then in modeling, the relation between these parameters and total execution time is modeled by multivariate linear regression. Among the possible configuration parameters, two main parameters have been used in this study: the number of Mappers, and the number of Reducers. For evaluation, two standard applications (WordCount, and Exim Mainlog parsing) are utilized to evaluate our technique on a 4-node MapReduce platform. Keywords—MapReduce, Configuration parameters, total execution time, multivariate linear regression

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Modeling CPU Utilization of MapReduce Applications

In this paper, we present an approach to predict the total CPU utilization in terms of CPU clock tick of applications when running on MapReduce framework. Our approach has two key phases: profiling and modeling. In the profiling phase, an application is run several times with different sets of MapReduce configuration parameters to profile total CPU clock tick of the application on a given platf...

متن کامل

Automatic Tuning of MapReduce Jobs using Uncertain Pattern Matching Analysis

In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, the patterns along with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications. To achieve this goal, CPU utilization patterns of new application...

متن کامل

A Study on Using Uncertain Time Series Matching Algorithms in Map-Reduce Applications

In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, the patterns along with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications. To achieve this goal, CPU utilization patterns of new application...

متن کامل

Optimization Strategies for A/B Testing on HADOOP

In this work, we present a set of techniques that considerably improve the performance of executing concurrent MapReduce jobs. Our proposed solution relies on proper resource allocation for concurrent Hive jobs based on data dependency, inter-query optimization and modeling of Hadoop cluster load. To the best of our knowledge, this is the first work towards Hive/MapReduce job optimization which...

متن کامل

PISCES: Optimizing Multi-job Application Execution in MapReduce

Nowadays, many MapReduce applications consist of groups of jobs with dependencies among each other, such as iterative machine learning applications and large database queries. Unfortunately, the MapReduce framework is not optimized for these multi-job applications. It does not explore the execution overlapping opportunities among jobs and can only schedule jobs independently. These issues signi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1203.0651  شماره 

صفحات  -

تاریخ انتشار 2012