A Task-Oriented Evaluation Metric for Machine Translation
نویسندگان
چکیده
Evaluation remains an open and fundamental issue for machine translation (MT). The inherent subjectivity of any judgment about the quality of translation, whether human or machine, and the diversity of end uses and users of translated material, contribute to the difficulty of establishing relevant and efficient evaluation methods. The US Federal Intelligent Document Understanding Laboratory (FIDUL) is developing a new, task-oriented evaluation metric and methodology to measure MT systems in light of the tasks for which their output may be used. This paper describes the development of this methodology for Japanese-to-English MT. It includes a sample inventory of the tasks for which translated material is used, (e.g., filtering, detection, extraction) and describes exercises in which users perform each task with MT output. The methodology correlates the recorded subjective judgments of the raters in the DARPA MT Evaluation with users' performances on the task-based exercises. Analysis of the errors in scored texts determines whether the presence of certain error types in MT affects specific tasks and not others. Source language patterns that produced errors become a test set that can be easily and efficiently scored to evaluate the performance of any new Japanese-to-English MT system in terms of the task inventory.
منابع مشابه
CobaltF: A Fluent Metric for MT Evaluation
The vast majority of Machine Translation (MT) evaluation approaches are based on the idea that the closer the MT output is to a human reference translation, the higher its quality. While translation quality has two important aspects, adequacy and fluency, the existing referencebased metrics are largely focused on the former. In this work we combine our metric UPF-Cobalt, originally presented at...
متن کاملAutomatic Evaluation Measures for Statistical Machine Translation System Optimization
Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no single correct translation. In the extreme case, two translations of the same input can have completely different words and sentence structure while still both being perfectly valid. Large projects and competitions for MT research raised the need for reliable and efficient evaluation of MT systems. F...
متن کاملStochastic Iterative Alignment for Machine Translation Evaluation
A number of metrics for automatic evaluation of machine translation have been proposed in recent years, with some metrics focusing on measuring the adequacy of MT output, and other metrics focusing on fluency. Adequacy-oriented metrics such as BLEU measure n-gram overlap of MT outputs and their references, but do not represent sentence-level information. In contrast, fluency-oriented metrics su...
متن کاملMEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles
We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail ...
متن کاملTask-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation
We conduct a pilot study for task-oriented evaluation of Multiword Expression (MWE) in Statistical Machine Translation (SMT). We propose two different integration strategies for MWE in SMT, which take advantage of different degrees of MWE semantic compositionality and yield complementary improvements in SMT quality on a large-scale translation task.1
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010