نتایج جستجو برای: hadoop
تعداد نتایج: 2553 فیلتر نتایج به سال:
Abstract The popularization of Hadoop as the the-facto standard platform for data analytics in context Big Data applications has led to upsurge SQL-on-Hadoop systems, which provide scalable query execution engines allowing use SQL queries on stored HDFS. In this context, Kubernetes appears leading choice simplify deployment and scaling containerized applications; however, there is a lack studie...
The description of big data, data processing technologies (big data), Hadoop storage systems are considered
Access plan recommendation is a query optimization approach that executes new queries using prior created execution plans (QEPs). The optimizer divides the space into clusters in mentioned method. However, traditional clustering algorithms take significant amount of time for such large datasets. MapReduce distributed computing model provides efficient solutions storing and processing vast quant...
The Hadoop Distributed File System (HDFS) is the distributed storage infrastructure for the Hadoop big-data analytics ecosystem. A single node, called the NameNode of HDFS stores the metadata of the entire file system and coordinates the file content placement and retrieval actions of the data storage subsystems, called DataNodes. However the single Na-meNode architecture has long been viewed a...
The increasing use of computing resources in our daily lives leads to data being generated at an unprecedent rate. The computing industry is being repeatedly questioned for its ability to accommodate the unpredictable growth rate of data, and its ability to process them. This has encouraged the development of cluster based data-intensive applications. Hadoop is a popular open source framework k...
SALSA examines system logs to derive state-machine views of the sytem’s execution, along with controlflow, data-flow models and related statistics. Exploiting SALSA’s derived views and statistics, we can effectively construct higher-level useful analyses. We demonstrate SALSA’s approach by analyzing system logs generated in a Hadoop cluster, and then illustrate SALSA’s value by developing visua...
In this paper we present an opensource machine translation toolkit Chaski which is capable of training phrase-based machine translation models on Hadoop clusters. The toolkit provides a full training pipeline including distributed word alignment, word clustering and phrase extraction. The toolkit also provides an extended error-tolerance mechanism over standardHadoop error-tolerance framework. ...
We demonstrate a powerful and easy-to-use tool called Dedoop (Deduplication with Hadoop) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a browser-based specification of complex ER workflows including blocking and matching steps as well as the optional use of machine learning for the automatic generation of match classifiers. Specified workflows are automatically t...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید