نتایج جستجو برای: hadoop

تعداد نتایج: 2553  

Journal: :PVLDB 2011
Mohamed Y. Eltabakh Yuanyuan Tian Fatma Özcan Rainer Gemulla Aljoscha Krettek John McPherson

Hadoop has become an attractive platform for large-scale data analytics. In this paper, we identify a major performance bottleneck of Hadoop: its lack of ability to colocate related data on the same set of nodes. To overcome this bottleneck, we introduce CoHadoop, a lightweight extension of Hadoop that allows applications to control where data are stored. In contrast to previous approaches, CoH...

2013
Michael Ovsiannikov Silvius Rus Damian Reeves Paul Sutter Sriram Rao Jim Kelly

The Quantcast File System (QFS) is an efficient alternative to the Hadoop Distributed File System (HDFS). QFS is written in C++, is plugin compatible with Hadoop MapReduce, and offers several efficiency improvements relative to HDFS: 50% disk space savings through erasure coding instead of replication, a resulting doubling of write throughput, a faster name node, support for faster sorting and ...

2016
N. Anila Sundar Vijay Krishna Menon P. N. Kumar

Cluster computing is an approach for storing and processing huge amount of data that is being generated. Hadoop and Spark are the two cluster computing platforms which are prominent today. Hadoop incorporates the MapReduce concept and is scalable as well as fault-tolerant. But the limitations of Hadoop paved way for another cluster computing framework named Spark. It is faster and can also mana...

2013
Jisha S Manjaly

Hadoop is a framework for BigData processing in distributed applications. Hadoop cluster is built for running data intensive distributed applications. Hadoop distributed file system is the primary storage area for BigData. MapReduce is a model to aggregate tasks of a job. Task assignment is possible by schedulers. Schedulers guarantee the fair allocation of resources among users. When a user su...

Journal: :JDIM 2014
Weihua Ma Hong Zhang Qianmu Li Bin Xia

Development of big data computing has brought many changes to society and social life is constantly digitized. ‘How to handle vast amounts of data’ has become a more and more fashionable topic. Hadoop is a distributed computing software framework, which includes HDFS and MapReduce distributed computing method, make distributed processing huge amounts of data possible. Then job scheduler determi...

2014
Kamal Kc Vincent W. Freeh

Hadoop is a widely used open source mapreduce framework. Its performance is critical because it increases the usefulness of products and services for a large number of companies who have adopted Hadoop for their business purposes. One of the configuration parameters that influences the resource allocation and thus the performance of a Hadoop application is map slot value (MSV). MSV determines t...

Journal: :PVLDB 2013
K. Ashwin Kumar Jonathan Gluck Amol Deshpande Jimmy J. Lin

The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine. Today, this assumption needs to be re-evaluated. Although petabyte-scale datastores are increasingly common, it is unclear whether “typical” analytics tasks require more than a single high-end server. Additionally, we are ...

2012
Mohamed H. Almeer

Image processing algorithms related to remote sensing have been tested and utilized on the Hadoop MapReduce parallel platform by using an experimental 112-core high-performance cloud computing system that is situated in the Environmental Studies Center at the University of Qatar. Although there has been considerable research utilizing the Hadoop platform for image processing rather than for its...

2011
Alexander Schätzle Martin Przyjaciel-Zablocki Thomas Hornung Georg Lausen

Dieser Beitrag untersucht die effiziente Auswertung von SPARQLAnfragen auf großen RDF-Datensätzen. Zum Einsatz kommt hierfür das Apache Hadoop Framework, eine bekannte Open-Source Implementierung von Google's MapReduce, das massiv parallelisierte Berechnungen auf einem verteilten System ermöglicht. Zur Auswertung von SPARQL-Anfragen mit Hadoop wird in diesem Beitrag PigSPARQL, eine Übersetzung ...

2010
Lai Yang Zhongzhi Shi

Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید