MRBS: A Comprehensive MapReduce Benchmark Suite

نویسندگان

  • Amit Sangroya
  • Damián Serrano
  • Sara Bouchenak
چکیده

MapReduce is a promising programming model for distributed data processing. Extensive research has been conducted on the scalability of MapReduce, and several systems have been proposed in the literature, ranging from job scheduling to data placement and replication. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapReduce techniques have been evaluated using microbenchmarks in an overly simplified setting, which may not be representative of real-world applications. This paper presents MRBS, a comprehensive benchmark suite for evaluating the performance of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to characterize application workload and dataload, and produces extensive high-level and low-level performance statistics. We illustrate the use of MRBS with Hadoop clusters running on Amazon EC2. Keywords-Benchmark; Performance; MapReduce; Hadoop; Cloud Computing

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LNCS 7640 - Euro-Par 2012: Parallel Processing Workshops

MapReduce is a popular programming model for distributeddata processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance tonew fault-tolerance models. However, realistic benchmarks are still miss-ing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions...

متن کامل

HiBench: A Representative and Comprehensive Hadoop Benchmark Suite

MapReduce and its popular open source implementation, Hadoop, are moving toward ubiquitous for Big Data storage and processing. Therefore, it is essential to quantitatively evaluate and characterize the Hadoop deployment through extensive benchmarking. In this paper, we present HiBench [1], a representative and comprehensive benchmark suite for Hadoop, which consists of a set of Hadoop programs...

متن کامل

Benchmarking and Performance studies of MapReduce / Hadoop Framework on Blue Waters Supercomputer

MapReduce is an emerging and widely used programming model for large-scale data parallel applications that require to process large amount of raw data. There are several implementations of MapReduce framework, among which Apache Hadoop is the most commonly used and open source implementaion. These frameworks are rarely deployed on supercomputers as massive as Blue Waters. We want to evaluate ho...

متن کامل

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) colocate in today’s data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guarant...

متن کامل

A MR Simulator in Facilitating Cloud Computing

MapReduce is an enabling technology in support of Cloud Computing. Hadoop which is a mapReduce implementation has been widely used in developing MapReduce applications. This paper presents Hadoop simulatorHaSim, MapReduce simulator which builds on top of Hadoop. HaSim models large number of parameters that can affect the behaviors of MapReduce nodes, and thus it can be used to tune the performa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012