نتایج جستجو برای: hadoop

تعداد نتایج: 2553  

2015
Arpit Gupta Rajiv Pandey Komal Verma

This term paper focuses on how the big data is analysed in a distributed environment through Hadoop Map Reduce. Big Data is same as “small data” but bigger in size. Thus, it is approached in different ways. Storage of Big Data requires analysing the characteristics of data. It can be processed by the employment of Hadoop Map Reduce. Map Reduce is a programming model working parallel for large c...

Journal: :PVLDB 2013
Stefan Richter Jens Dittrich Stefan Schuh Tobias Frey

Mosquito is a lightweight and adaptive physical design framework for Hadoop. Mosquito connects to existing data pipelines in Hadoop MapReduce and/or HDFS, observes the data, and creates better physical designs, i.e. indexes, as a byproduct. Our approach is minimally invasive, yet it allows users and developers to easily improve the runtime of Hadoop. We present three important use cases: first,...

2013
Brian Cho Muntasir Rahman Tej Chajed Indranil Gupta Cristina Abad Nathan Roberts Philbert Lin

This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) smart eviction policies for jobs and for tasks, based on resource usage, task runtime, and job deadlines; and ii) a work-conserving task preemption mechanism. We incorporated Natjam in...

Journal: :PVLDB 2013
Andrii Cherniak Huma Zaidi Vladimir Zadorozhny

In this work, we present a set of techniques that considerably improve the performance of executing concurrent MapReduce jobs. Our proposed solution relies on proper resource allocation for concurrent Hive jobs based on data dependency, inter-query optimization and modeling of Hadoop cluster load. To the best of our knowledge, this is the first work towards Hive/MapReduce job optimization which...

2012
Kai Ren YongChul Kwon Magdalena Balazinska Bill Howe

We analyze Hadoop workloads from three different research clusters from an application-level perspective, with two goals: (1) explore new issues in application patterns and user behavior and (2) understand key performance challenges related to IO and load balance. Our analysis suggests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions, and tools as we...

Journal: :PVLDB 2013
Kai Ren YongChul Kwon Magdalena Balazinska Bill Howe

We analyze Hadoop workloads from three di↵erent research clusters from a user-centric perspective. The goal is to better understand data scientists’ use of the system and how well the use of the system matches its design. Our analysis suggests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions, and tools. We see significant diversity in resource usage ...

2016
P. Srinivasa Rao

− Due to the latest developments in the area of science and Technology resulted in the developments of efficient data transfer, capability of handling huge data and the retrieval of data efficiently. Since the data that is stored is increasing voluminously, methods to retrieve relative information and security related concerns are to be addressed efficiently to secure this bulk data. Also with ...

Journal: :CoRR 2013
Akshay MS Suhas Mohan Vincent Kuri Dinkar Sitaram H. L. Phalachandra

Due to its advantages over traditional data centers, there has been a rapid growth in the usage of cloud infrastructures. These include public clouds (e.g., Amazon EC2), or private clouds, such as clouds deployed using Open-stack. A common factor in many of the well-known infrastructures, for example Openstack and Cloudstack, is that networked storage is used for storage of persistent data. How...

2013
Kai Ren YongChul Kwon Magdalena Balazinska Bill Howe

We analyze Hadoop workloads from three di↵erent research clusters from a user-centric perspective. The goal is to better understand data scientists’ use of the system and how well the use of the system matches its design. Our analysis suggests that Hadoop usage is still in its adolescence. We see underuse of Hadoop features, extensions, and tools. We see significant diversity in resource usage ...

2015
S. Suresh

The data generated and processed by modern computing systems burgeon rapidly. MapReduce is an important programming model for large scale data intensive applications. Hadoop is a popular open source implementation of MapReduce and Google File System (GFS). The scalability and fault-tolerance feature of Hadoop makes it as a standard for BigData processing. Hadoop uses Hadoop Distributed File Sys...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید