نتایج جستجو برای: hadoop

تعداد نتایج: 2553  

Journal: :Journal of Physics: Conference Series 2021

Journal: :International Journal on Cloud Computing: Services and Architecture 2014

Journal: :Cluster Computing 2022

Abstract The popularization of Hadoop as the the-facto standard platform for data analytics in context Big Data applications has led to upsurge SQL-on-Hadoop systems, which provide scalable query execution engines allowing use SQL queries on stored HDFS. In this context, Kubernetes appears leading choice simplify deployment and scaling containerized applications; however, there is a lack studie...

Journal: : 2022

The description of big data, data processing technologies (big data), Hadoop storage systems are considered

Journal: :Mathematics 2022

Access plan recommendation is a query optimization approach that executes new queries using prior created execution plans (QEPs). The optimizer divides the space into clusters in mentioned method. However, traditional clustering algorithms take significant amount of time for such large datasets. MapReduce distributed computing model provides efficient solutions storing and processing vast quant...

2016
Ziling Huang

The Hadoop Distributed File System (HDFS) is the distributed storage infrastructure for the Hadoop big-data analytics ecosystem. A single node, called the NameNode of HDFS stores the metadata of the entire file system and coordinates the file content placement and retrieval actions of the data storage subsystems, called DataNodes. However the single Na-meNode architecture has long been viewed a...

2014
Tran Anh Phuong Manuel Antunes Veiga Eduardo Teixeira Rodrigues David Manuel Martins de Matos

The increasing use of computing resources in our daily lives leads to data being generated at an unprecedent rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data, and its ability to process them. This has encouraged the development of cluster based data-intensive applications. Hadoop is a popular open source framework k...

2008
Jiaqi Tan Xinghao Pan Soila Kavulya Rajeev Gandhi Priya Narasimhan

SALSA examines system logs to derive state-machine views of the sytem’s execution, along with controlflow, data-flow models and related statistics. Exploiting SALSA’s derived views and statistics, we can effectively construct higher-level useful analyses. We demonstrate SALSA’s approach by analyzing system logs generated in a Hadoop cluster, and then illustrate SALSA’s value by developing visua...

Journal: :Prague Bull. Math. Linguistics 2010
Qin Gao Stephan Vogel

In this paper we present an opensource machine translation toolkit Chaski which is capable of training phrase-based machine translation models on Hadoop clusters. The toolkit provides a full training pipeline including distributed word alignment, word clustering and phrase extraction. The toolkit also provides an extended error-tolerance mechanism over standardHadoop error-tolerance framework. ...

Journal: :PVLDB 2012
Lars Kolb Andreas Thor Erhard Rahm

We demonstrate a powerful and easy-to-use tool called Dedoop (Deduplication with Hadoop) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a browser-based specification of complex ER workflows including blocking and matching steps as well as the optional use of machine learning for the automatic generation of match classifiers. Specified workflows are automatically t...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید