learner corpus

SubCo: A Learner Translation Corpus of Human and Machine Subtitles

2016

José Manuel Martínez Martínez Mihaela Vela

In this paper, we present a freely available corpus of human and automatic translations of subtitles. The corpus comprises the original English subtitles (SRC), both human (HT) and machine translations (MT) into German, as well as post-editions (PE) of the MT output. HT and MT are annotated with errors. Moreover, human evaluation is included in HT, MT, and PE. Such a corpus is a valuable resour...

متن کامل

Evaluating and automating the annotation of a learner corpus

Journal: :Language Resources and Evaluation 2014

Alexandr Rosen Jirka Hana Barbora Stindlová Anna Feldman

The paper describes a corpus of texts produced by non-native speakers of Czech. We discuss its annotation scheme, consisting of three interlinked tiers, designed to handle a wide range of error types present in the input. Each tier corrects different types of errors; links between the tiers allow capturing errors in word order and complex discontinuous expressions. Errors are not only corrected...

متن کامل

A Corpus-Based Comparison of Syntactic Complexity in Spoken and Written Learner Language

Journal: :Canadian Journal of Applied Linguistics 2022

Despite writing and speaking being related activities, their end-products are entirely different. However, previous studies have not shown consistency in terms of grammar use these two modes. Accordingly, the present study, I aim to define syntactic characteristics modes with large-scale data organized research designs. This study examined 14 indices complexity specific factors 224 monologues 1...

متن کامل

The Microgenetic Changes in EFL Learners’ Vocabulary Development A Learner-Corpus-Based Study

Journal: :Advances in social science, education and humanities research 2023

متن کامل

A Corpus-based Comparative Study of “get-passive” Semantic Prosody

Journal: :Journal of contemporary educational research 2021

Research on semantic prosody education is playing a vital role in the process of learning English.This research based LOB corpus (Lancaster-Oslo/Bergen corpus) and IWriteBaby corpus, core library IWrite Chinese English Learner Corpus. Using AntConc 3.5.7, to compare preference collocation words get two corpora that meet get-passive structure. Aiming differences between learners native speakers ...

متن کامل

The ALeSKo learner corpus : Design – annotation – quantitative analyses

2011

Heike Zinsmeister Margit Breckle

The ALesKo learner corpus is a small-scale comparable corpus consisting of two subcorpora: annotated essays by advanced Chinese learners of German and comparable essays by German native speakers. The motivation for its compilation was the investigation of discourse-related phenomena such as local coherence in second-language acquisition of German. After introducing how the texts were compiled a...

متن کامل

Corpus-driven methods for assessing accuracy in learner production

2009

John Benjamins Stefanie Wulff

This electronic file may not be altered in any way. The author(s) of this article is/are permitted to use this PDF file to generate printed copies to be used by way of offprints, for their personal use only. Permission is granted by the publishers to post this file on a closed server which is accessible to members (students and staff) only of the author’s/s’ institute, it is not permitted to po...

متن کامل

Developing Learner Corpus Annotation for Korean Particle Errors

2012

Sun-Hee Lee Markus Dickinson Ross Israel

We aim to sufficiently define annotation for post-positional particle errors in L2 Korean writing, so that future work on automatic particle error detection can make progress. To achieve this goal, we outline the linguistic properties of Korean particles in learner data. Given the agglutinative nature of Korean and the range of functions of particles, this annotation effort involves issues such...

متن کامل

A Method for Detecting Determiner Errors Designed for the Writing of Non-native Speakers of English

Journal: :IEICE Transactions 2012

Ryo Nagata Atsuo Kawai

This paper proposes a method for detecting determiner errors, which are highly frequent in learner English. To augment conventional methods, the proposed method exploits a strong tendency displayed by learners in determiner usage, i.e., mistakenly omitting determiners most of the time. Its basic idea is simple and applicable to almost any conventional method. This paper also proposes combining ...

متن کامل

A spoken language understanding approach using successive learners

2006

Wei-Lin Wu Ruzhan Lu Hui Liu Feng Gao

In this paper, we describe a novel spoken language understanding approach using two successive learners. The first learner is used to identify the topic of an input utterance. With the restriction of the recognized target topic, the second learner is trained to extract the corresponding slot-value pairs. The advantage of the proposed approach is that it is mainly datadriven and requires only mi...

متن کامل