نتایج جستجو برای: hadoop

تعداد نتایج: 2553  

2013
Shouvik Bardhan Daniel A. Menascé

Hadoop is a leading open source tool that supports the realization of the Big Data revolution and is based on Google’s MapReduce pioneering work in the field of ultra large amount of data storage and processing. Instead of relying on expensive proprietary hardware, Hadoop clusters typically consist of hundreds or thousands of multi-core commodity machines. Instead of moving data to the processi...

2012
Yandong Wang Yizheng Jiao Cong Xu Xiaobing Li Teng Wang Xinyu Que Cristian Cira Bin Wang Zhuo Liu Bliss Bailey Weikuan Yu

Hadoop is a successful open-source implementation of MapReduce programming model. It has been widely adopted by many leading industry companies for big data analytics. However, its intermediate data shuffling is a timeconsuming operation that impacts the total execution time of MapReduce programs. Recently, a growing number of organizations are interested in addressing this issue by leveraging ...

2010
Michael J. Fischer Xueyuan Su Yitong Yin

In recent years Google’s MapReduce has emerged as a leading large-scale data processing architecture. Adopted by companies such as Amazon, Facebook, Google, IBM and Yahoo! in daily use, and more recently put in use by several universities, it allows parallel processing of huge volumes of data over cluster of machines. Hadoop is a free Java implementation of MapReduce. In Hadoop, files are split...

Journal: :Computers 2014
Saeed Shahrivari

Today, big data is generated from many sources and there is a huge demand for storing, managing, processing, and querying on big data. The MapReduce model and its counterpart open source implementation Hadoop, has proven itself as the de facto solution to big data processing. Hadoop is inherently designed for batch and high throughput processing jobs. Although Hadoop is very suitable for batch ...

2016
Trupti Mali Deepti Varshney

ARTICLE INFO Hadoop represents a Java-based distributed computing framework that is designed to support applications that are implemented via the MapReduce programming model. Hadoop performance however is significantly affected by the settings of the Hadoop configuration parameters. Unfortunately, manually tuning these parameters is very time-consuming. Existing system uses Random forest approa...

2014
Priya Deshpande Darshan Bora

Due to brisk growth of data volume in many organizations, large-scale data processing became a demanding topic for industry as well as for academic fields. Hadoop is widely adopted in Cloud Computing environment for unstructured data. Hadoop is an open source, a java based distributed computing framework, and supports large-scale distributed data processing. In the recent years, Hadoop Distribu...

2015
Jianbin Cui Hongying Meng

The rapid development of Internet and cloud computing technologies has led to explosive generation and processing of huge amounts of data. The ever increasing data volumes bring great values to societies, but in the meantime bring forward a number of challenges. Data mining techniques have been widely used in decision analysis in financial, medical, management, business and many other fields. H...

2016
XIANJIN LUO Xianjin LUO Chenggang ZHEN

The existing Hadoop clusters are mostly composed of heterogeneous nodes, which have different computing and storage capacities, with the speed of maps to reduce tasks performed on the nodes being quite different. However, the finish time of the entire job is determined by the slowest task, so looking for the “drag tasks” strategy has a dominant position in the whole job scheduling process. The ...

2016
Fabian Fier Eva Höfer Johann-Christoph Freytag

MapReduce and Hadoop are often used synonymously. For optimal runtime performance, Hadoop users have to consider various implementation details and configuration parameters. When conducting performance experiments with Hadoop on different algorithms, it is hard to choose a set of such implementation optimizations and configuration options which is fair to all algorithms. By fair we mean default...

2017
Maanak Gupta Farhan Patwa Ravi S. Sandhu

Hadoop ecosystem provides a highly scalable, fault-tolerant and cost-effective platform for storing and analyzing variety of data formats. Apache Ranger and Apache Sentry are two predominant frameworks used to provide authorization capabilities in Hadoop ecosystem. In this paper we present a formal multi-layer access control model (called HeAC) for Hadoop ecosystem, as an academic-style abstrac...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید