task evaluation

Collecting Reliable Human Judgements on Machine-Generated Language: The Case of the QG-STEC Data

2016

Keith Godwin Paul Piwek

Question generation (QG) is the problem of automatically generating questions from inputs such as declarative sentences. The Shared Evaluation Task Challenge (QG-STEC) Task B that took place in 2010 evaluated several state-of-the-art QG systems. However, analysis of the evaluation results was affected by low inter-rater reliability. We adapted Nonaka & Takeuchi’s knowledge creation cycle to the...

متن کامل

Establishing Performance Baselines for Text Understanding Systems

1989

Beth Sundheim

A task-oriented evaluation of text understanding systems was prepared and conducted. Nine different NLP systems participated in the evaluation. NOSC collected 150 texts to be used as development (i.e. training) and test data and prepared explanatory documentation on them. The performance task--a simulated database update task--and the expected outputs for each text were defined. A scoring syste...

متن کامل

SemEval-2010 Task 14: Word Sense Induction &Disambiguation

2010

Suresh Manandhar Ioannis P. Klapaftis Dmitriy Dligach Sameer Pradhan

This paper presents the description and evaluation framework of SemEval-2010 Word Sense Induction & Disambiguation task, as well as the evaluation results of 26 participating systems. In this task, participants were required to induce the senses of 100 target words using a training set, and then disambiguate unseen instances of the same words using the induced senses. Systems’ answers were eval...

متن کامل

Comparison of Graduate Medical Education in Iran with WFME International Guidelines: Quality Improvement in Postgraduate Medical Education

Journal: مجله ایرانی آموزش در علوم پزشکی 2002

Azim Mirzazadeh, Masoud Naseripoor, Saman Tavakoli,

In 2001, following the development of International Standards in basic medical education, WFME appointed an international Task Force for development of International Guidelines for Postgraduate Specialist Training. Reports of this Task Force were published in September 2001. These Guidelines has been structured in 9 areas and 37 sub-areas. The areas of these guidelines are mission & outcomes, t...

متن کامل

بررسی روش های ارزیابی صرف زمان فعل و تعیین بهترین روش در کودکان 3 و 4 ساله شهر رشت در سال 1393

ژورنال: مجله علمی پژوهان 2015

بخشی, عنایت الله, خوشحال, زینب, شیرازی, طاهره سیما, محمودی بختیاری, بهروز,

Introduction: one domain of morphology is inflection that adds syntactic considerations to the words. This domain is affected in individual with language disorders. So evaluation of inflection in these people is important. In this study, methods of verb tense inflection evaluation were compared and the best method was determined. Methods: This study was descriptive-analytical. The participa...

متن کامل

An Application-Level Scheduling with Task Bundling Approach for Many-Task Computing in Heterogeneous Environments

2012

Jian Xiao Yu Zhang Shuwei Chen Huashan Yu

Many-Task Computing (MTC) is a widely used computing paradigm for large-scale task-parallel processing. One of the key issues in MTC is to schedule a large number of independent tasks onto heterogeneous resources. Traditional task-level scheduling heuristics, like Min-Min, Sufferage and MaxStd, cannot readily be applied in this scenario. As most of MTC tasks are usually fine-grained, the resour...

متن کامل

[The effect of reading tasks on learning from multiple texts].

Journal: :Shinrigaku kenkyu : The Japanese journal of psychology 2014

Keiichi Kobayashi

This study examined the effect of reading tasks on the integration of content and source information from multiple texts. Undergraduate students (N = 102) read five newspaper articles about a fictitious incident in either a summarization task condition or an evaluation task condition. Then, they performed an integration test and a source choice test, which assessed their understanding of a situ...

متن کامل

Evaluation by Association: A Systematic Study of Quantitative Word Association Evaluation

2017

Anna Korhonen Ivan Vulic Douwe Kiela

Recent work on evaluating representation learning architectures in NLP has established a need for evaluation protocols based on subconscious cognitive measures rather than manually tailored intrinsic similarity and relatedness tasks. In this work, we propose a novel evaluation framework that enables large-scale evaluation of such architectures in the free word association (WA) task, which is fi...

متن کامل

EVENTI EValuation of Events and Temporal INformation at Evalita

2014

Tommaso Caselli Manuela Speranza Rachele Sprugnoli Monica Monachini

English. This report describes the EVENTI (EValuation of Events aNd Temporal Information) task organized within the EVALITA 2014 evaluation campaign. The EVENTI task aims at evaluating the performance of Temporal Information Processing systems on a corpus of Italian news articles. Motivations for the task, datasets, evaluation metrics, and results obtained by participating systems are presented...

متن کامل

A machine translation system combining rule-based machine translation and statistical post-editing

2014

Terumasa Ehara

System architecture, experimental settings and evaluation results of the EIWA in the WAT2014 Japanese to English (jaen) and Chinese to Japanese (zh-ja) tasks are described. Our system is combining rule-based machine translation (RBMT) and statistical post-editing (SPE). Evaluation results for ja-en task show 19.86 BLEU score, 0.7067 RIBES score, and 22.50 human evaluation score. Evaluation resu...

متن کامل