Discriminating Similar Languages: Evaluations and Explorations
نویسندگان
چکیده
We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties. We carried out a number of experiments using the results of the two editions of the Discriminating between Similar Languages (DSL) shared task. We investigate the progress made between the two tasks, estimate an upper bound on possible performance using ensemble and oracle combination, and provide learning curves to help us understand which languages are more challenging. A number of difficult sentences are identified and investigated further with human annotation.
منابع مشابه
A Simple Baseline for Discriminating Similar Languages
This paper describes an approach to discriminating similar languages using wordand characterbased features, submitted as the Queen Mary University of London entry to the Discriminating Similar Languages shared task. Our motivation was to investigate how well a simple, datadriven, linguistically naive method could perform, in order to provide a baseline by which more linguistically complex or kn...
متن کاملDiscriminating similar languages with token-based backoff
In this paper we describe the language identification system built within the Finno-Ugric Languages and the Internet project for the Discriminating between Similar Languages (DSL) shared task in LT4VarDial workshop at RANLP-2015. The system reached fourth place in normal closed submissions (94.7% accuracy) and second place in closed submissions with the named entities blinded (93.0% accuracy).
متن کاملDistributed Representations of Words and Documents for Discriminating Similar Languages
Discriminating between similar languages or language varieties aims to detect lexical and semantic variations in order to classify these varieties of languages. In this work we describe the system built by the Pattern Recognition and Human Language Technology (PRHLT) research center Universitat Politècnica de València and Autoritas Consulting for the Discriminating between similar languages (DS...
متن کاملA two-level classifier for discriminating similar languages
The BRUniBP team’s submission is presented for the Discriminating between Similar Languages Shared Task 2015. Our method is a two phase classifier that utilizes both character and word-level features. The evaluation shows 100% accuracy on language group identification and 93.66% accuracy on language identification. The main contribution of the paper is a memory-efficient correlation based featu...
متن کاملDiscriminating between Similar Languages Using PPM
The paper presents the results of participation of Bobicev team in DSL (Discriminating Similar Languages) shared task 2015. It describes the use of PPM (Prediction by Partial Matching) for language discrimination. The accuracy of the presented system was equal to 94.14% for the first set and 92.22% for the second set. The results were scored as the 4th for the first task and 5th for the second ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1610.00031 شماره
صفحات -
تاریخ انتشار 2016