Identifying off-topic student essays without topic-specific training data

نویسندگان

  • Derrick Higgins
  • Jill Burstein
  • Y. Attali
چکیده

Educational assessment applications, as well as other natural-language interfaces, need some mechanism for validating user responses. If the input provided to the system is infelicitous or uncooperative, the proper response may be to simply reject it, to route it to a bin for special processing, or to ask the user to modify the input. If problematic user input is instead handled as if it were the system’s normal input, this may degrade users’ confidence in the software, or suggest ways in which they might try to “game” the system. Our specific task in this domain is the identification of student essays which are “off-topic”, or not written to the test question topic. Identification of off-topic essays is of great importance for the commercial essay evaluation system Criterion. The previous methods used for this task required 200– 300 human scored essays for training purposes. However, there are situations in which no essays are available for training, such as when users (teachers) wish to spontaneously write a new topic for their students. For these kinds of cases, we need a system that works reliably without training data. This paper describes an algorithm that detects when a student’s essay is off-topic without requiring a set of topic-specific essays for training. This new system is comparable in performance to previous models which require topic-specific essays for training, and provides more detailed information about the way in which an essay diverges from the requested essay topic.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advanced Capabilities for Evaluating Student Writing: Detecting Off-Topic Essays Without Topic-Specific Training

We have developed a method to identify when a student essay is off-topic, i.e. the essay does not respond to the test question topic. This task is motivated by a real-world problem: detecting when students using a commercial essay evaluation system, Criterion, enter off-topic essays. Sometimes this is done in bad faith to trick the system; other times it is inadvertent, and the student has cut-...

متن کامل

Off-topic essay detection using short prompt texts

Our work addresses the problem of predicting whether an essay is off-topic to a given prompt or question without any previouslyseen essays as training data. Prior work has used similarity between essay vocabulary and prompt words to estimate the degree of ontopic content. In our corpus of opinion essays, prompts are very short, and using similarity with such prompts to detect off-topic essays y...

متن کامل

A Machine Learning Approach for Identification Thesis and Conclusion Statements in Student Essays

This study describes and evaluates two essay-based discourse analysis systems that identify thesis and conclusion statements from student essays written on six different essay topics. Essays used to train and evaluate the systems were annotated by two human judges, according to a discourse annotation protocol. Using a machine learning approach, a number of discourse-related features were automa...

متن کامل

Finding the WRITE Stuff: Automatic Identification of Discourse Structure in Student Essays

automated feedback that helps them revise their work and ultimately improve their writing skills. These applications also address educational researchers’ interest in individualized instruction. Specifically, feedback that refers explicitly to students’own writing is more effective than general feedback.3 Our discourse analysis software, which is embedded in Criterion (www.etstechnologies.com),...

متن کامل

Language Based Mapping of Science Assessment Items to Skills

Knowledge of the association between assessment questions and the skills required to solve them is necessary for analysis of student learning. This association, often represented as a Q-matrix, is either handlabeled by domain experts or learned as latent variables given a large student response data set. As a means of automating the match to formal standards, this paper uses neural text classif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Natural Language Engineering

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2006