نتایج جستجو برای: hadoop
تعداد نتایج: 2553 فیلتر نتایج به سال:
The performance of three Hadoop applications is reported for several virtual configurations on VMware vSphere 5 and compared to native configurations. A well-balanced seven-node AMAX ClusterMax system was used to show that the average performance difference between native and the simplest virtualized configurations is only 4%. Further, the flexibility enabled by virtualization to create multipl...
This paper describes an initial study where the opensource Hadoop parallel and distributed run-time environment is used to speed-up the construction phase of a large high-dimensional index. This paper first discusses the typical practical problems developers may run into when porting their code to Hadoop. It then presents early experimental results showing that the performance gains are substan...
Today, application schedulers are decoupled from routing level schedulers, leading to sub-optimal throughput for cloud computing platforms. In this thesis, we propose a cross-layer scheduling framework that bridges the application level scheduler with the routing level scheduler (SDN). We realize our framework in a batch-processing framework (Hadoop [1]) and a streamprocessing framework (Storm ...
The performance of three Hadoop applications is reported for several virtual configurations on VMware vSphere 5 and compared to native configurations. A well-balanced seven-node AMAX ClusterMax system was used to show that the average performance difference between native and the simplest virtualized configurations is only 4%. Further, the flexibility enabled by virtualization to create multipl...
MapReduce is a kind of software framework for easily writing applications which process vast amounts of data on large clusters of commodity hardware. In order to get better allocation of tasks and load balancing, the MapReduce work mode and task scheduling algorithm of Hadoop platform is analyzed in this paper. According to this situation that the number of tasks of the smaller weight job is mo...
Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expres-siveness. We are therefore developing a new end-to-end approach for graph data management and analysi...
In this paper we discuss PigSPARQL, a competitive yet easy to use SPARQL query processing system on MapReduce that allows adhoc SPARQL query processing on large RDF graphs out of the box. Instead of a direct mapping, PigSPARQL uses the query language of Pig, a data analysis platform on top of Hadoop MapReduce, as an intermediate layer between SPARQL and MapReduce. This additional level of abstr...
With the rapid growth of technology, scientists have realized the challenge of efficiently analyzing large data sets since the beginning of 21 century. Increases in data volume and data complexity shift scientists’ focus to parallel, distributed algorithms running on clusters. In 2004, Jeffrey Dean and Sanjay Ghemawat from Google introduced a new programming model to store and process large dat...
This article presents benchmarking results of two benchmarking sets (run on small clusters of 6 and 9 nodes) applied to Hive and Pig running on Hadoop 0.14.1. The first set of results were obtainted by replicating the Apache Pig benchmark published by the Apache Foundation on 11/07/07 (which served as a baseline to compare major Pig Latin releases). The second results were obtained by applying ...
In this paper, we propose a Hadoop-based Distributed Video Transcoding System in a cloud computing environment that transcodes various video codec formats into the MPEG-4 video format. This system provides various types of video content to heterogeneous devices such as smart phones, personal computers, television, and pads. We design and implement the system using the MapReduce framework, which...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید