learner corpora

Design and construction of the Tracking Written Learner Language (TRAWL) Corpus: A longitudinal and multilingual young learner corpus

Journal: :Nordic Journal of Language Teaching and Learning (formerly NJMLM) 2023

This article describes the design and construction of Tracking Written Learner Language (TRAWL) Corpus. The corpus combines several features that are all rare for learner corpora: it is longitudinal, following individual pupils over years; has data from young learners school years 5 to 13 (ages 10–18); multilingual, containing learners’ texts in L3s (French, German Spanish), L2 English L1 Norwe...

متن کامل

Review of Nacey (2013): Metaphor in Learner English. Corpora and Language Learners

Journal: :International Journal of Learner Corpus Research 2015

متن کامل

Automating Second Language Acquisition Research: Integrating Information Visualisation and Machine Learning

2012

Helen Yannakoudakis Ted Briscoe Theodora Alexopoulou

We demonstrate how data-driven approaches to learner corpora can support Second Language Acquisition research when integrated with visualisation tools. We present a visual user interface supporting the investigation of a set of linguistic features discriminating between pass and fail ‘English as a Second or Other Language’ exam scripts. The system displays directed graphs to model interactions ...

متن کامل

A Learner Corpus-based Approach to Verb Suggestion for ESL

2013

Yu Sawai Mamoru Komachi Yuji Matsumoto

We propose a verb suggestion method which uses candidate sets and domain adaptation to incorporate error patterns produced by ESL learners. The candidate sets are constructed from a large scale learner corpus to cover various error patterns made by learners. Furthermore, the model is trained using both a native corpus and the learner corpus via a domain adaptation technique. Experiments on two ...

متن کامل

Building a Korean Web Corpus for Analyzing Learner Language

2010

Markus Dickinson Ross Israel Sun-Hee Lee

Post-positional particles are a significant source of errors for learners of Korean. Following methodology that has proven effective in handling English preposition errors, we are beginning the process of building a machine learner for particle error detection in L2 Korean writing. As a first step, however, we must acquire data, and thus we present a methodology for constructing large-scale cor...

متن کامل

a comparative analysis of lexical bundles in journalistic writing in english and persian: a contrastive linguistic perspective

Journal: :international journal of foreign language teaching and research 2012

marzieh rafiee mahbube keihaniyan

this paper investigates the use of ‘lexical bundles’ in two broad corpora of journalistic writing. the aim of this study is to compare the use of lexical bundles in the two domains, one consisted of newspaper articles written in english and published in england and the other one comprised of newspaper articles written in persian from iranian publications. for this purpose, the frequency of occu...

متن کامل

Inter-annotator Agreement for Dependency Annotation of Learner Language

2013

Marwa Ragheb Markus Dickinson

This paper reports on a study of interannotator agreement (IAA) for a dependency annotation scheme designed for learner English. Reliably-annotated learner corpora are a necessary step for the development of POS tagging and parsing of learner language. In our study, three annotators marked several layers of annotation over different levels of learner texts, and they were able to obtain generall...

متن کامل

Corpus-Based Error Analysis of Chinese Learners’ Use of High-Frequency Verb Take

Journal: :English Language Teaching 2022

This study investigated the erroneous use of high-frequency verb TAKE by Chinese college learners English as a foreign language (EFL), aiming to identify similarities and differences between EFL learners, aimed at finding out more effective ways for teaching researching verbs. Corpus-based Contrastive Interlanguage Analysis Error were carried in present study, with subcorpora ST4 ST6 CLEC (Chin...

متن کامل

Creation and Analysis of a Reading Comprehension Exercise Corpus: Towards Evaluating Meaning in Context

2012

Niels Ott Ramon Ziai

We discuss the collection and analysis of a cross-sectional and longitudinal learner corpus consisting of answers to reading comprehension questions written by adult second language learners of German. We motivate the need for such task-based learner corpora and identify the properties which make reading comprehension exercises a particularly interesting task. In terms of the creation of the co...

متن کامل

Error Annotation of the Arabic Learner Corpus - A New Error Tagset

2013

Abdullah Alfaifi Eric Atwell Ghazi Abuhakema

This paper introduces a new two-level error tagset, AALETA (Alfaifi Atwell Leeds Error Tagset for Arabic), to be used for annotating the Arabic Learner Corpora (ALC). The new tagset includes six broad classes, subdivided into 37 more specific error types or subcategories. It is easily understood by Arabic corpus error annotators. AALEETA is based on an existing error tagset for Arabic corpora, ...

متن کامل