GLEU: Automatic Evaluation of Sentence-Level Fluency

نویسندگان

  • Andrew Mutton
  • Mark Dras
  • Stephen Wan
  • Robert Dale
چکیده

In evaluating the output of language technology applications—MT, natural language generation, summarisation—automatic evaluation techniques generally conflate measurement of faithfulness to source content with fluency of the resulting text. In this paper we develop an automatic evaluation metric to estimate fluency alone, by examining the use of parser outputs as metrics, and show that they correlate with human judgements of generated text fluency. We then develop a machine learner based on these, and show that this performs better than the individual parser metrics, approaching a lower bound on human performance. We finally look at different language models for generating sentences, and show that while individual parser metrics can be ‘fooled’ depending on generation method, the machine learner provides a consistent estimator of fluency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regression for Sentence-Level MT Evaluation with Pseudo References

Many automatic evaluation metrics for machine translation (MT) rely on making comparisons to human translations, a resource that may not always be available. We present a method for developing sentence-level MT evaluation metrics that do not directly rely on human reference translations. Our metrics are developed using regression learning and are based on a set of weaker indicators of fluency a...

متن کامل

Structural Features for Predicting the Linguistic Quality of Text - Applications to Machine Translation, Automatic Summarization and Human-Authored Text

Sentence structure is considered to be an important component of the overall linguistic quality of text. Yet few empirical studies have sought to characterize how and to what extent structural features determine fluency and linguistic quality. We report the results of experiments on the predictive power of syntactic phrasing statistics and other structural features for these aspects of text. Ma...

متن کامل

Stochastic Iterative Alignment for Machine Translation Evaluation

A number of metrics for automatic evaluation of machine translation have been proposed in recent years, with some metrics focusing on measuring the adequacy of MT output, and other metrics focusing on fluency. Adequacy-oriented metrics such as BLEU measure n-gram overlap of MT outputs and their references, but do not represent sentence-level information. In contrast, fluency-oriented metrics su...

متن کامل

Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics

In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring insequence n-grams automatically. The second m...

متن کامل

Improve the Evaluation of Translation Fluency by Using Entropy of Matched Sub-segments

The widely-used automatic evaluation metrics cannot adequately reflect the fluency of the translations. The n-grambased metrics, like BLEU, limit the maximum length of matched fragments to n and cannot catch the matched fragments longer than n, so they can only reflect the fluency indirectly. METEOR, which is not limited by n-gram, uses the number of matched chunks but it does not consider the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007