MRBS: A Comprehensive MapReduce Benchmark Suite
نویسندگان
چکیده
MapReduce is a promising programming model for distributed data processing. Extensive research has been conducted on the scalability of MapReduce, and several systems have been proposed in the literature, ranging from job scheduling to data placement and replication. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapReduce techniques have been evaluated using microbenchmarks in an overly simplified setting, which may not be representative of real-world applications. This paper presents MRBS, a comprehensive benchmark suite for evaluating the performance of MapReduce systems. MRBS includes five benchmarks covering several application domains and a wide range of execution scenarios such as data-intensive vs. compute-intensive applications, or batch applications vs. online interactive applications. MRBS allows to characterize application workload and dataload, and produces extensive high-level and low-level performance statistics. We illustrate the use of MRBS with Hadoop clusters running on Amazon EC2. Keywords-Benchmark; Performance; MapReduce; Hadoop; Cloud Computing
منابع مشابه
LNCS 7640 - Euro-Par 2012: Parallel Processing Workshops
MapReduce is a popular programming model for distributeddata processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance tonew fault-tolerance models. However, realistic benchmarks are still miss-ing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions...
متن کاملHiBench: A Representative and Comprehensive Hadoop Benchmark Suite
MapReduce and its popular open source implementation, Hadoop, are moving toward ubiquitous for Big Data storage and processing. Therefore, it is essential to quantitatively evaluate and characterize the Hadoop deployment through extensive benchmarking. In this paper, we present HiBench [1], a representative and comprehensive benchmark suite for Hadoop, which consists of a set of Hadoop programs...
متن کاملBenchmarking and Performance studies of MapReduce / Hadoop Framework on Blue Waters Supercomputer
MapReduce is an emerging and widely used programming model for large-scale data parallel applications that require to process large amount of raw data. There are several implementations of MapReduce framework, among which Apache Hadoop is the most commonly used and open source implementaion. These frameworks are rarely deployed on supercomputers as massive as Blue Waters. We want to evaluate ho...
متن کاملBigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads
Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) colocate in today’s data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guarant...
متن کاملA MR Simulator in Facilitating Cloud Computing
MapReduce is an enabling technology in support of Cloud Computing. Hadoop which is a mapReduce implementation has been widely used in developing MapReduce applications. This paper presents Hadoop simulatorHaSim, MapReduce simulator which builds on top of Hadoop. HaSim models large number of parameters that can affect the behaviors of MapReduce nodes, and thus it can be used to tune the performa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012