نتایج جستجو برای: evaluation metrics
تعداد نتایج: 878773 فیلتر نتایج به سال:
We investigate evaluation metrics for endto-end dialogue systems where supervised labels, such as task completion, are not available. Recent works in end-to-end dialogue systems have adopted metrics from machine translation and text summarization to compare a model’s generated response to a single target response. We show that these metrics correlate very weakly or not at all with human judgeme...
We propose three new features for MT evaluation: source-sentence constrained n-gram precision, source-sentence reordering metrics, and discriminative unigram precision, as well as a method of learning linear feature weights to directly maximize correlation with human judgments. Our source-sentence constrained n-gram precision achieves, among all the testing metrics including BLEU, NIST, ROUGE, ...
The problem of evaluating general architectures is a difficult one (Newell, 1990). Comparative evaluations that focus on performance alone are especially problematic. It is usually feasible to develop a specialized solution for any particular problem that will outperform a general solution, such as one developed within a cognitive architecture. Thus, an evaluation of the architectural approach ...
Precisely evaluating the quality of a translation against human references is a challenging task due to the flexible word ordering of a sentence and the existence of a large number of synonyms for words. This paper proposes to evaluate translations with distributed representations of words and sentences. We study several metrics based on word and sentence representations and their combination. ...
NASA’s Earth Science Information Partnership Federation is an experiment funded to assess the ability of a group of widely heterogeneous earth science data or service providers to self organize and provide improved and cheaper access to an expanding earth science user community. As it is organizing itself, the federation is mandated to set in place an evaluation methodology and collect metrics ...
Recent work on interpretability has focused concept-based explanations, where deep learning models are explained in terms of high-level units information, referred to as concepts. Concept models, however, have been shown be prone encoding impurities their representations, failing fully capture meaningful features inputs. While concept lacks metrics measure such phenomena, the field disentanglem...
This paper reports results from an experiment that was aimed at comparing evaluation metrics for machine translation. Implemented as a workshop at a major conference in 2002, the experiment defined an evaluation task, description of the metrics, as well as test data consisting of human and machine translations of two texts. Several metrics, either applicable by human judges or automated, were u...
Software metrics are widely accepted tools to control and assure software quality. A large number of software metrics with a variety of content can be found in the literature. Software metrics are widely accepted tools to control and assure software quality. A large number of software metrics with a variety of content can be found in the literature. In this paper, different software complexity ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید