hadoop

Image Data Classification using Hadoop Based on Semi Supervise Algorithm

2017

Pratik Gite Aditya Acharya Udit Gupta

In this paper, an technique is presented for storing and dispensation bulky satellite images by using the Hadoop MapReduce framework and HDFS(Hadoop distributed file system)by incorporate Remote Sensing image processing tools into MapReduce The huge volume of visual data in current years and their require for efficient and efficient processing arouse the exploit of distributed image processing ...

متن کامل

Budget based dynamic slot allocation for MapReduce clusters

2016

S. Janani

MapReduce is one of the programming models for processing large amount of data in cloud where resource allocation is one of the research areas since it is responsible for improving the performance of Hadoop. However the resource allocation can be further improved by focusing on a set of mechanisms, that includes the budget based HFS algorithm where the fast worker node is identified first based...

متن کامل

Intelligent Block Placement Strategy in Heterogeneous Hadoop Clusters

2013

Lili Sun Yang Yang Zenggang Xiong Xiaoyong Zhao

MapReduce is an important distributed processing model for large-scale data-intensive applications. As an open-source implementation of MapReduce, Hadoop provides enterprises with a cost-efficient solution for their analytics needs. However, the default HDFS block placement policy assumes that computing nodes in a cluster are homogeneous, and tries to balance load by placing blocks randomly, wh...

متن کامل

Processing Wikipedia Dumps - A Case-study Comparing the XGrid and MapReduce Approaches

2011

Dominique Thiébaut Yang Li Diana Jaunzeikare Alexandra Cheng Ellysha Raelen Recto Gillian Riggs Xia Ting Zhao Tonje Stolpestad Cam Le T. Nguyen

We present a simple comparison of the performance measured as the total execution time taken to parse a 27-GByte XML dump of the English wikipedia on three different cluster platforms: Apple’s XGrid, and Hadoop the open-source version of Google’s MapReduce. We use a local hadoop cluster of Linux workstation, as well as an Elastic MapReduce cluster rented from Amazon. We show that for selected b...

متن کامل

A Study on Digital Forensics in Hadoop

2017

Sachin Arun Thanekar

Nowadays we all are surrounded by big data. The term ‘Big Data’ itself indicates huge volume, high velocity, variety and veracity i.e. uncertainty of data which gave rise to new difficulties and challenges. Hadoop is a framework which can be used for tremendous data storage and faster processing. It is freely available, easy to use and implement. Big data forensic is one of the challenges of bi...

متن کامل

Research on Deep Web Query Interface Clustering Based on Hadoop

Journal: :JSW 2014

Baohua Qiang Rui Zhang Yufeng Wang Qian He Wei Li Sai Wang

How to cluster different query interfaces effectively is one of the most core issues when generating integrated query interface on Deep Web integration domain. However, with the rapid development of Internet technology, the number of Deep Web query interface shows an explosive growth trend. For this reason, the traditional stand-alone Deep Web query interface clustering approaches encounter bot...

متن کامل

Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19

Journal: :Big data and cognitive computing 2021

Big data have become a global strategic issue, as increasingly large amounts of unstructured challenge the IT infrastructure organizations and threaten their capacity for forecasting. As experienced in former massive information issues, big technologies, such Hadoop, should efficiently tackle incoming provide with relevant processed that was formerly neither visible nor manageable. After having...

متن کامل

Data Duplication Tactics with Hadoop

Journal: :International Journal of Computer Applications 2019

متن کامل

Big Data is no longer equivalent to Hadoop in the industry

2017

Andreas Tönne

For a long time, industry projects solved big data problems with Hadoop. The massive scalability of MapReduce algorithms and the HBase database brought solutions to an unanticipated level of computing. But this obstructs the view for the need of change. Business goals that emerge from Industry 4.0 or IoT have long been addressed with a suboptimal architecture. New business goals require a rethi...

متن کامل

Using Hadoop File System and MapReduce in a small/medium Grid site

2012

H Riahi G Donvito L Fanò M Fasi G Marzulli D Spiga A Valentini

Data storage and data access represent the key of CPU-intensive and data-intensive high performance Grid computing. Hadoop is an open-source data processing framework that includes fault-tolerant and scalable distributed data processing model and execution environment, named MapReduce, and distributed File System, named Hadoop distributed File System (HDFS). HDFS was deployed and tested within ...

متن کامل