Efficient Resource Utilization in Hadoop on Virtual Machine

نویسندگان

Jinto Thomas

Manjunath Mulimani

چکیده

Hadoop is one of open source software technology that is used for processing large amount of data across clusters of commodity servers in distributed manner. Mainly it is designed to provide high fault tolerance and scale up a single server to thousands numbers of machines. Hadoop uses Hadoop distributed file system (HDFS) which is open source implementation of Google File System (GFS) for data storage. Map/Reduce is the main functionality used for storing data in HDFS. We have environment where Hadoop is deployed in virtual machine in which we can use Kernel-based Virtual Machine (KVM) as virtualization infrastructure. Existing service just use the entire resources available for admitted job. In such situation resource utilization is not proper efficient. It exceeds the limit of minimum resources which is required to finish the job. This configuration is resulting in poor resource utilization with higher cost. So avoid this create a new cluster for each job is assigned. Instead of customer to decide the resources for the job, this model automatically select desired systems for finish the job with minimum resource utilization. Keywords— Hadoop, Cluster, KVM, Virtualization, HDFS

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient and Parallel Data Processing and Resource Allocation in the Cloud by using Nephele’s Data Processing Framework

Cloud computing is a technology in which the Cloud Service Providers (CSP) provide many virtual servers to the users to store their information in the cloud. The faults occurring on the assignment and dismission of the virtual machines, the processing cost in the allocation of resources must also be considered. The parallel processing of the information on the virtual machines must be done effe...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

MROrchestrator: A Fine-Grained Resource Orchestration Framework for Hadoop MapReduce

Efficient resource management in data centers and clouds running large distributed data processing frameworks like Hadoop is crucial for enhancing the performance of hosted MapReduce applications, and boosting the resource utilization. However, existing resource scheduling schemes in Hadoop allocate resources at the granularity of fixed-size, static portions of the nodes, called slots. A slot r...

متن کامل

A Systematic Review of Existing VM Migration Techniques

The virtual machine migration is core feature of virtualization that plays important role in cloud computing. The resource utilization is monitored by local migration agent that launches migration of virtual machine from one physical machine to another. Various virtual machine migration techniques are explored for efficient utilization of resources. With rapid increase in data centers, it becom...

متن کامل

Task Scheduling in the Cloud Using Machine Learning Classification

Cloud computing is a distributed computing model which enables developers to automatically deploy their applications onto the cloud. There are many applications running on a cloud which requires parallel processing capabilities. Applications of such nature require an efficient scheduling algorithm to manage heavy traffic. The drawbacks of existing scheduling algorithms are low resource utilizat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Efficient Resource Utilization in Hadoop on Virtual Machine

نویسندگان

چکیده

منابع مشابه

Efficient and Parallel Data Processing and Resource Allocation in the Cloud by using Nephele’s Data Processing Framework

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

MROrchestrator: A Fine-Grained Resource Orchestration Framework for Hadoop MapReduce

A Systematic Review of Existing VM Migration Techniques

Task Scheduling in the Cloud Using Machine Learning Classification

عنوان ژورنال:

اشتراک گذاری