نتایج جستجو برای: hadoop
تعداد نتایج: 2553 فیلتر نتایج به سال:
Hadoop has become an attractive platform for large-scale data analytics. In this paper, we identify a major performance bottleneck of Hadoop: its lack of ability to colocate related data on the same set of nodes. To overcome this bottleneck, we introduce CoHadoop, a lightweight extension of Hadoop that allows applications to control where data are stored. In contrast to previous approaches, CoH...
The Quantcast File System (QFS) is an efficient alternative to the Hadoop Distributed File System (HDFS). QFS is written in C++, is plugin compatible with Hadoop MapReduce, and offers several efficiency improvements relative to HDFS: 50% disk space savings through erasure coding instead of replication, a resulting doubling of write throughput, a faster name node, support for faster sorting and ...
Cluster computing is an approach for storing and processing huge amount of data that is being generated. Hadoop and Spark are the two cluster computing platforms which are prominent today. Hadoop incorporates the MapReduce concept and is scalable as well as fault-tolerant. But the limitations of Hadoop paved way for another cluster computing framework named Spark. It is faster and can also mana...
Hadoop is a framework for BigData processing in distributed applications. Hadoop cluster is built for running data intensive distributed applications. Hadoop distributed file system is the primary storage area for BigData. MapReduce is a model to aggregate tasks of a job. Task assignment is possible by schedulers. Schedulers guarantee the fair allocation of resources among users. When a user su...
Development of big data computing has brought many changes to society and social life is constantly digitized. ‘How to handle vast amounts of data’ has become a more and more fashionable topic. Hadoop is a distributed computing software framework, which includes HDFS and MapReduce distributed computing method, make distributed processing huge amounts of data possible. Then job scheduler determi...
Hadoop is a widely used open source mapreduce framework. Its performance is critical because it increases the usefulness of products and services for a large number of companies who have adopted Hadoop for their business purposes. One of the configuration parameters that influences the resource allocation and thus the performance of a Hadoop application is map slot value (MSV). MSV determines t...
The underlying assumption behind Hadoop and, more generally, the need for distributed processing is that the data to be analyzed cannot be held in memory on a single machine. Today, this assumption needs to be re-evaluated. Although petabyte-scale datastores are increasingly common, it is unclear whether “typical” analytics tasks require more than a single high-end server. Additionally, we are ...
Image processing algorithms related to remote sensing have been tested and utilized on the Hadoop MapReduce parallel platform by using an experimental 112-core high-performance cloud computing system that is situated in the Environmental Studies Center at the University of Qatar. Although there has been considerable research utilizing the Hadoop platform for image processing rather than for its...
Dieser Beitrag untersucht die effiziente Auswertung von SPARQLAnfragen auf großen RDF-Datensätzen. Zum Einsatz kommt hierfür das Apache Hadoop Framework, eine bekannte Open-Source Implementierung von Google's MapReduce, das massiv parallelisierte Berechnungen auf einem verteilten System ermöglicht. Zur Auswertung von SPARQL-Anfragen mit Hadoop wird in diesem Beitrag PigSPARQL, eine Übersetzung ...
Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید