Tuning Hadoop Map Slot Value Using CPU Metric

نویسندگان

  • Kamal Kc
  • Vincent W. Freeh
چکیده

Hadoop is a widely used open source mapreduce framework. Its performance is critical because it increases the usefulness of products and services for a large number of companies who have adopted Hadoop for their business purposes. One of the configuration parameters that influences the resource allocation and thus the performance of a Hadoop application is map slot value (MSV). MSV determines the number of map tasks that run concurrently on a node. For a given architecture, a Hadoop application has an MSV for which its performance is best. Furthermore, there is not a single map slot value that is best for all applications. A Hadoop application’s performance suffers when MSV is not the best. Therefore, knowing the best MSV is important for an application. In this work, we find a low-overhead method to predict the best MSV using a new Hadoop counter that measures per-map task CPU utilization. Our experiments on a wide variety of Hadoop applications show that using a single MSV for all applications results in performance degradation up to 132% when compared to using the best MSV for each application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization Framework for Map Reduce Clusters on Hadoop’s Configuration

ARTICLE INFO Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming. Existing system uses Random forest approa...

متن کامل

MROrchestrator: A Fine-Grained Resource Orchestration Framework for Hadoop MapReduce

Efficient resource management in data centers and clouds running large distributed data processing frameworks like Hadoop is crucial for enhancing the performance of hosted MapReduce applications, and boosting the resource utilization. However, existing resource scheduling schemes in Hadoop allocate resources at the granularity of fixed-size, static portions of the nodes, called slots. A slot r...

متن کامل

An Experimental Evaluation of Data Placement Scheme Using Hadoop Cluster

G. Sasikala1, N. Meenakshi2, 1Dept. of computer science, Valliammai Engineering College, TamilNadu, India. 2Assistant Professor, Dept. of computer science, Valliammai Engineering College, TamilNadu, India. ---------------------------------------------------------------------***--------------------------------------------------------------------Abstract – A network bandwidth is steadily increasi...

متن کامل

Improved Input Data Splitting in MapReduce

The performance of MapReduce greatly depends on its data splitting process which happens before the map phase. This is usually done using naive methods which are not at all optimal. In this paper, an Improved Input Splitting technology based on locality is explained which aims at addressing the input data splitting problems which affects the job performance seriously. Improved Input Splitting c...

متن کامل

Improving the Performance of Processing for Small Files in Hadoop: A Case Study of Weather Data Analytics

-Hadoop is an open source Apache project that supports master slave architecture, which involves one master node and thousands of slave nodes. Master node acts as the name node, which stores all the metadata of files and slave nodes acts as the data nodes, which stores all the application data. Hadoop is designed to process large data sets (petabytes). It becomes a bottleneck, when handling mas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014