hadoop

CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce

2014

Wei-Chun Chung Chien-Chih Chen Jan-Ming Ho Chung-Yen Lin Wen-Lian Hsu Yu-Chun Wang D. T. Lee Feipei Lai Chih-Wei Huang Yu-Jung Chang

BACKGROUND Explosive growth of next-generation sequencing data has resulted in ultra-large-scale data sets and ensuing computational problems. Cloud computing provides an on-demand and scalable environment for large-scale data analysis. Using a MapReduce framework, data and workload can be distributed via a network to computers in the cloud to substantially reduce computational latency. Hadoop/...

متن کامل

Optimization Techniques for "Scaling Down" Hadoop on Multi-Core, Shared-Memory Systems

2014

K. Ashwin Kumar Jonathan Gluck Amol Deshpande Jimmy J. Lin

The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine. Today, this assumption needs to be re-evaluated. Although petabyte-scale datastores are increasingly common, it is unclear whether “typical” analytics tasks require more than a single high-end server. Additionally, we are ...

متن کامل

Improvement in Performance of Hadoop using Hace Process and Word Count Result with Bigdata

2016

Vivek Badhe Shweta Verma

Figuring innovation has changed the way we work, concentrate on, and live. The appropriated information preparing innovation is one of the mainstream themes in the IT field. It gives a straightforward and concentrated registering stage by lessening the expense of the equipment. The attributes of circulated information preparing innovation have changed the entire business. Hadoop, as the open so...

متن کامل

Mochi: Visual Log-Analysis Based Tools for Debugging Hadoop (CMU-PDL-09-103)

2015

Jiaqui Tan Xinghao Pan Soila Kavulya Rajeev Gandhi Priya Narasimhan Jiaqi Tan

Mochi, a new visual, log-analysis based debugging tool correlates Hadoop’s behavior in space, time and volume, and extracts a causal, unified controland data-flow model of Hadoop across the nodes of a cluster. Mochi’s analysis produces visualizations of Hadoop’s behavior using which users can reason about and debug performance issues. We provide examples of Mochi’s value in revealing a Hadoop j...

متن کامل

Survey on Hadoop and Introduction to YARN

2014

Amogh Pramod Kulkarni Mahesh Khandewal

Big Data, the analysis of large quantities of data to gain new insight has become a ubiquitous phrase in recent years. Day by day the data is growing at a staggering rate. One of the efficient technologies that deal with the Big Data is Hadoop, which will be discussed in this paper. Hadoop, for processing large data volume jobs uses MapReduce programming model. Hadoop makes use of different sch...

متن کامل

Optimizing Hadoop* Deployments

2010

Intel is a major contributor to open source initiatives, such as Linux*, Apache*, and Xen*, and has also devoted resources to Hadoop analysis, testing, and performance characterizations, both internally and with fellow travelers such as HP and Cloudera. Through these technical efforts, Intel has observed many practical trade-offs in hardware, software, and system settings that have real-world i...

متن کامل

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud

2011

Jinquan Dai Jie Huang Shengsheng Huang Bo Huang Yan Liu

Although Big Data Cloud (e.g., MapReduce, Hadoop and Dryad) makes it easy to develop and run highly scalable applications, efficient provisioning and finetuning of these massively distributed systems remain a major challenge. In this paper, we describe a general approach to help address this challenge, based on distributed instrumentations and dataflow-driven performance analysis. Based on this...

متن کامل

Live Website Traffic Analysis Integrated with Improved Performance for Small Files using Hadoop

2010

T Auntin Jose

Hadoop, an open source java framework deals with big data. It has HDFS (Hadoop distributed file system) and MapReduce. HDFS is designed to handle large amount files through clusters and suffers performance penalty while dealing with large number of small files. These large numbers of small files pose a heavy burden on the NameNode of HDFS and an increase execution time for MapReduce. Secondly, ...

متن کامل

Data Transfers in Hadoop: A Comparative Study

Journal: :OJBD 2015

Ujjal Marjit Kumar Sharma Puspendu Mandal

Hadoop is an open source framework for processing large amounts of data in distributed computing environment. It plays an important role in processing and analyzing the Big Data. This framework is used for storing data on large clusters of commodity hardware. Data input and output to and from Hadoop is an indispensable action for any data processing job. At present, many tools have been evolved...

متن کامل

HEBR: A High Efficiency Block Reporting Scheme for HDFS

2017

Sumukhi Chandrashekar Lihao Xu

Hadoop platform is widely being used for managing, analyzing and transforming large data sets in various systems. Two basic components of Hadoop are: 1) a distributed file system (HDFS) 2) a computation framework (MapReduce). HDFS stores data on simple commodity machines that run DataNode processes (DataNodes). A commodity machine running NameNode process (NameNode) maintains meta data informat...

متن کامل