hadoop

نتایج جستجو برای: hadoop

تعداد نتایج: 2553 فیلتر نتایج به سال:

Big Data Management Using Hadoop

Journal: :Journal of Physics: Conference Series 2021

متن کامل

Hadoop Scheduler with Deadline Constraint

Journal: :International Journal on Cloud Computing: Services and Architecture 2014

متن کامل

On the performance of SQL scalable systems on Kubernetes: a comparative study

Journal: :Cluster Computing 2022

Abstract The popularization of Hadoop as the the-facto standard platform for data analytics in context Big Data applications has led to upsurge SQL-on-Hadoop systems, which provide scalable query execution engines allowing use SQL queries on stored HDFS. In this context, Kubernetes appears leading choice simplify deployment and scaling containerized applications; however, there is a lack studie...

متن کامل

BIG DATA PROCESSING TECHNOLOGY

Journal: : 2022

The description of big data, data processing technologies (big data), Hadoop storage systems are considered

متن کامل

Performance Evaluation of Query Plan Recommendation with Apache Hadoop and Apache Spark

Journal: :Mathematics 2022

Access plan recommendation is a query optimization approach that executes new queries using prior created execution plans (QEPs). The optimizer divides the space into clusters in mentioned method. However, traditional clustering algorithms take significant amount of time for such large datasets. MapReduce distributed computing model provides efficient solutions storing and processing vast quant...

متن کامل

DNN: A Distributed NameNode Filesystem for Hadoop

2016

Ziling Huang

The Hadoop Distributed File System (HDFS) is the distributed storage infrastructure for the Hadoop big-data analytics ecosystem. A single node, called the NameNode of HDFS stores the metadata of the entire file system and coordinates the file content placement and retrieval actions of the data storage subsystems, called DataNodes. However the single Na-meNode architecture has long been viewed a...

متن کامل

Reliable and Locality Driven Scheduling in Hadoop

2014

Tran Anh Phuong Manuel Antunes Veiga Eduardo Teixeira Rodrigues David Manuel Martins de Matos

The increasing use of computing resources in our daily lives leads to data being generated at an unprecedent rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data, and its ability to process them. This has encouraged the development of cluster based data-intensive applications. Hadoop is a popular open source framework k...

متن کامل

SALSA: Analyzing Logs as StAte Machines

2008

Jiaqi Tan Xinghao Pan Soila Kavulya Rajeev Gandhi Priya Narasimhan

SALSA examines system logs to derive state-machine views of the sytem’s execution, along with controlflow, data-flow models and related statistics. Exploiting SALSA’s derived views and statistics, we can effectively construct higher-level useful analyses. We demonstrate SALSA’s approach by analyzing system logs generated in a Hadoop cluster, and then illustrate SALSA’s value by developing visua...

متن کامل

Training Phrase-Based Machine Translation Models on the CloudOpen Source Machine Translation Toolkit Chaski

Journal: :Prague Bull. Math. Linguistics 2010

Qin Gao Stephan Vogel

In this paper we present an opensource machine translation toolkit Chaski which is capable of training phrase-based machine translation models on Hadoop clusters. The toolkit provides a full training pipeline including distributed word alignment, word clustering and phrase extraction. The toolkit also provides an extended error-tolerance mechanism over standardHadoop error-tolerance framework. ...

متن کامل

Dedoop: Efficient Deduplication with Hadoop

Journal: :PVLDB 2012

Lars Kolb Andreas Thor Erhard Rahm

We demonstrate a powerful and easy-to-use tool called Dedoop (Deduplication with Hadoop) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a browser-based specification of complex ER workflows including blocking and matching steps as well as the optional use of machine learning for the automatic generation of match classifiers. Specified workflows are automatically t...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید