mapreduce

Cogset: a high performance MapReduce engine

Journal: :Concurrency and Computation: Practice and Experience 2013

Steffen Viken Valvåg Dag Johansen Åge Kvalnes

MapReduce has become a widely employed programming model for large-scale data-intensive computations. Traditional MapReduce engines employ dynamic routing of data as a core mechanism for fault tolerance and load balancing. An alternative mechanism is static routing, which reduces the need to store temporary copies of intermediate data, but requires a tighter coupling between the components for ...

متن کامل

Traffic Analysis in MapReduce

2016

Anjana Sharma

-MapReduce is a programming model, which can process the large set of data and produces the output. The MapReduce contains two functions to complete the work, those are Map function and Reduce function. The Map function will get assign fragmented data as input and then its emit intermediate data with key and send to this intermediate data with key to the Reducer, where Reducer will get the inpu...

متن کامل

Master-worker model for MapReduce paradigm on the TILE64 many-core platform

Journal: :Future Generation Comp. Syst. 2014

Xuan-Yi Lin Yeh-Ching Chung

MapReduce is a popular programming paradigm for processing big data. It uses themaster–workermodel, which is widely used on distributed and loosely coupled systems such as clusters, to solve large problems with task parallelism.With the ubiquity ofmany-core architectures in recent years and foreseeable future, the many-core platform will be one of the main computing platforms to execute MapRedu...

متن کامل

Toward Scheduling I/O Request of Mapreduce Tasks Based on Markov Model

2015

Sonia Ikken Éric Renault M. Tahar Kechadi AbdelKamel Tari

In Cloud storage of multiple CPU cores, many Mapreduce applications may run in parallel on each compute node and collocate with local Disks storage. These Disks storage are shared by multiple applications that use full CPU power of the node. Each application tends to issue contiguous I/O requests in parallel to the same Disk; however if large number of Mapreduce tasks enters the I/O phase at th...

متن کامل

MapReduce with communication overlap (MaRCO)

Journal: :J. Parallel Distrib. Comput. 2013

Faraz Ahmad Seyong Lee Mithuna Thottethodi T. N. Vijaykumar

MapReduce is a programming model from Google for cluster-based computing in domains such as search engines, machine learning, and data mining. MapReduce provides automatic data management and fault tolerance to improve programmability of clusters. MapReduce’s execution model includes an all-map-to-all-reduce communication, called the shuffle, across the network bisection. Some MapReductions mov...

متن کامل

A Survey on Partitioning Skew Diminishing Techniques in Hadoop MapReduce Environment

2017

Y.Sravani Devi

In the era of Big Data, it creates large size of structured and unstructured data. MapReduce is an effective tool for parallel data processing. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data assigned to each task. This causes some tasks to take much longer to finish than others and can significantly impact performance. Parallel data p...

متن کامل

Parallelizing XML Processing Pipelines via MapReduce

2009

Daniel Zinn Sven Köhler Shawn Bowers Bertram Ludäscher

We present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that consume XML-structured data and produce, often through calls to “black-box” functions, modified (i.e., updated) XML structures. Our main contributions are a set of strategies for...

متن کامل

PigSPARQL: A SPARQL Query Processing Baseline for Big Data

2013

Alexander Schätzle Martin Przyjaciel-Zablocki Thomas Hornung Georg Lausen

In this paper we discuss PigSPARQL, a competitive yet easy to use SPARQL query processing system on MapReduce that allows adhoc SPARQL query processing on large RDF graphs out of the box. Instead of a direct mapping, PigSPARQL uses the query language of Pig, a data analysis platform on top of Hadoop MapReduce, as an intermediate layer between SPARQL and MapReduce. This additional level of abstr...

متن کامل

LNCS 7640 - Euro-Par 2012: Parallel Processing Workshops

2012

Ioannis Caragiannis Gerhard Goos Juris Hartmanis Jan van Leeuwen David Hutchison Josef Kittler Jon M. Kleinberg Gerhard Weikum Michael Alexander Rosa Maria Badia Mario Cannataro Alexandru Costan Marco Danelutto Frédéric Desprez Bettina Krammer Julio Sahuquillo Stephen L. Scott Josef Weidendorfer

MapReduce is a popular programming model for distributeddata processing. Extensive research has been conducted on the reliability of MapReduce, ranging from adaptive and on-demand fault-tolerance tonew fault-tolerance models. However, realistic benchmarks are still miss-ing to analyze and compare the effectiveness of these proposals. To date, most MapReduce fault-tolerance solutions...

متن کامل

Proving Equivalence Between Imperative and MapReduce Implementations Using Program Transformations

2018

Bernhard Beckert Timo Bingmann Moritz Kiefer Peter Sanders Mattias Ulbrich Alexander Weigl

Distributed programs are often formulated in popular functional frameworks like MapReduce, Spark and Thrill, but writing efficient algorithms for such frameworks is usually a non-trivial task. As the costs of running faulty algorithms at scale can be severe, it is highly desirable to verify their correctness. We propose to employ existing imperative reference implementations as specifications f...

متن کامل