نتایج جستجو برای: native language

تعداد نتایج: 523429  

2013
Kristopher Kyle Scott A. Crossley Jianmin Dai Danielle S. McNamara

This study explores the efficacy of an approach to native language identification that utilizes grammatical, rhetorical, semantic, syntactic, and cohesive function categories comprised of key n-grams. The study found that a model based on these categories of key n-grams was able to successfully predict the L1 of essays written in English by L2 learners from 11 different L1 backgrounds with an a...

2014
Shervin Malmasi Mark Dras

Language transfer, the characteristic second language usage patterns caused by native language interference, is investigated by Second Language Acquisition (SLA) researchers seeking to find overused and underused linguistic features. In this paper we develop and present a methodology for deriving ranked lists of such features. Using very large learner data, we show our method’s ability to find ...

2015
Shervin Malmasi Joel R. Tetreault Mark Dras

We examine different ensemble methods, including an oracle, to estimate the upper-limit of classification accuracy for Native Language Identification (NLI). The oracle outperforms state-of-the-art systems by over 10% and results indicate that for many misclassified texts the correct class label receives a significant portion of the ensemble votes, often being the runner-up. We also present a pi...

2014
Xiao Jiang Yufan Guo Jeroen Geertzen Dora Alexopoulou Lin Sun Anna Korhonen

Native Language Identification (NLI) is a task aimed at determining the native language (L1) of learners of second language (L2) on the basis of their written texts. To date, research on NLI has focused on relatively small corpora. We apply NLI to the recently released EFCamDat corpus which is not only multiple times larger than previous L2 corpora but also provides longitudinal data at several...

Francesca Frontini Francesca Mazzariello

This paper aims at investigating the acquisition of Italian complex predicates by native speakers of Persian. Complex predication is not as pervasive a phenomenon in Italian as it is in Persian. Yet Italian native speakers use complex predicates productively; spontaneous data show that Persian learners of Italian seem to be perfectly aware of Italian complex predicates and use this familiar fea...

2017
Andrea Cimino Felice Dell'Orletta

In this paper, we describe the approach of the ItaliaNLP Lab team to native language identification and discuss the results we submitted as participants to the essay track of NLI Shared Task 2017. We introduce for the first time a 2-stacked sentencedocument architecture for native language identification that is able to exploit both local sentence information and a wide set of general–purpose f...

2013
Amjad Abu-Jbara Rahul Jha Eric Morley Dragomir R. Radev

We present a system for automatically identifying the native language of a writer. We experiment with a large set of features and train them on a corpus of 9,900 essays written in English by speakers of 11 different languages. our system achieved an accuracy of 43% on the test data, improved to 63% with improved feature normalization. In this paper, we present the features used in our system, d...

2013
Baoli Li

Native Language Identification (NLI), which tries to identify the native language (L1) of a second language learner based on their writings, is helpful for advancing second language learning and authorship profiling in forensic linguistics. With the availability of relevant data resources, much work has been done to explore the native language of a foreign language learner. In this report, we p...

2012
Julian Brooke Graeme Hirst

Previous approaches to the task of native language identification (Koppel et al., 2005) have been limited to small, within-corpus evaluations. Because these are restrictive and unreliable, we apply cross-corpus evaluation to the task. We demonstrate the efficacy of lexical features, which had previously been avoided due to the within-corpus topic confounds, and provide a detailed evaluation of ...

2014
Adriane Boyd Jirka Hana Lionel Nicolas Walt Detmar Meurers Katrin Wisniewski Andrea Abel Karin Schöne Barbora Stindlová Chiara Vettori

The MERLIN corpus is a written learner corpus for Czech, German, and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains 2,290 learner texts produced in standardized language certifications covering CEFR levels A1–C1. The MERLIN annotation scheme includes a wide range of language characteri...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید