hadoop

نتایج جستجو برای: hadoop

تعداد نتایج: 2553 فیلتر نتایج به سال:

SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures

Journal: :PVLDB 2014

Avrilia Floratou Umar Farooq Minhas Fatma Özcan

SQL query processing for analytics over Hadoop data has recently gained significant traction. Among many systems providing some SQL support over Hadoop, Hive is the first native Hadoop system that uses an underlying framework such as MapReduce or Tez to process SQL-like statements. Impala, on the other hand, represents the new emerging class of SQL-on-Hadoop systems that exploit a shared-nothin...

متن کامل

Column-Oriented Storage Techniques for MapReduce

Journal: :PVLDB 2011

Avrilia Floratou Jignesh M. Patel Eugene J. Shekita Sandeep Tata

Users of MapReduce often run into performance problems when they scale up their workloads. Many of the problems they encounter can be overcome by applying techniques learned from over three decades of research on parallel DBMSs. However, translating these techniques to a MapReduce implementation such as Hadoop presents unique challenges that can lead to new design choices. This paper describes ...

متن کامل

Analyse-Lifecycle heterogener Informationen auf Basis von Hadoop und Visual Analytics

2016

Petra Zimmer Frank Reussner

Gaining an insight on the company’s mass of data was a common goal in the last few years. But information is growing exponentially and companies yearn for a data management system that is able to work with heterogenic data from different sources. A possible answer is the Hadoop Data Platform. With its diverse components, it makes several ways of data management as a foundation for the analysis....

متن کامل

Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search

2009

Jimmy J. Lin Tamer Elsayed Lidan Wang Donald Metzler

This paper describes Ivory, an attempt to build a distributed retrieval system around the open-source Hadoop implementation of MapReduce. We focus on three noteworthy aspects of our work: a retrieval architecture built directly on the Hadoop Distributed File System (HDFS), a scalable MapReduce algorithm for inverted indexing, and webpage classification to enhance retrieval effectiveness.

متن کامل

Hyrax: Demonstrating a New Foundation for Data-Parallel Computation

2009

Vinayak Borkar Michael Carey

We demonstrate Hyrax, a new runtime platform for dataparallel computation under development at UC Irvine under the ASTERIX project. We show the versatility of Hyrax by using it to run XQuery queries from ASTERIX, Hadoop MapReduce jobs using a Hadoop emulation layer, and SQL queries originating from Hive.

متن کامل

Sentiment Analysis of Social Networking Data Using Categorized Dictionary

Journal: مدیریت فناوری اطلاعات 2020

Aastha Sharma, Akansha Singh, Anuradha Dhull, Krishna Singh,

Sentiment analysis is the process of analyzing a person’s perception or belief about a particular subject matter. However, finding correct opinion or interest from multi-facet sentiment data is a tedious task. In this paper, a method to improve the sentiment accuracy by utilizing the concept of categorized dictionary for sentiment classification and analysis is proposed. A categorized dictiona...

متن کامل

Research on Scheduling Scheme for Hadoop Clusters

2013

Jiong Xie FanJun Meng Hailong Wang HongFang Pan JinHong Cheng Xiao Qin

In this paper, we import a prefetching mechanism into MapReduce model while retaining compatibility with the native Hadoop. Given a dataintensive application running on a Hadoop cluster, our approach estimates the execution time of each task and adaptively preloads an amount of data to the memory before the new task is assigned to the computing node.

متن کامل

Achieving Load Balancing of HDFS Clusters Using Markov Model

2012

Jin Kyu Kim

The combination of Hadoop and HDFS is becoming a defacto standard system in handling big data. HDFS is a distributed file system that is designed for big data. In HDFS, a file consists of multiple large sized blocks. A central management of HDFS tries to scatter these multiple blocks on different nodes to maximize the I/O throughput. Hadoop is a framework that supports data intensive parallel a...

متن کامل

Applying compression algorithms on hadoop cluster implementing through apache tez and hadoop mapreduce

Journal: :International Journal of Engineering & Technology 2018

متن کامل

Dynamic Resource Management in Vectorwise on Hadoop

2014

Cristian Mihai Bârcă

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید