Word-Level vs Sentence-Level Language Identification: Application to Algerian and Arabic Dialects
نویسندگان
چکیده
منابع مشابه
Sentence Level Dialect Identification in Arabic
This paper introduces a supervised approach for performing sentence level dialect identification between Modern Standard Arabic and Egyptian Dialectal Arabic. We use token level labels to derive sentence-level features. These features are then used with other core and meta features to train a generative classifier that predicts the correct label for each sentence in the given input text. The sy...
متن کاملSentence-level dialects identification in the greater China region
Identifying the different varieties of the same language is more challenging than unrelated languages identification. In this paper, we propose an approach to discriminate language varieties or dialects of Mandarin Chinese for the Mainland China, Hong Kong, Taiwan, Macao, Malaysia and Singapore, a.k.a., the Greater China Region (GCR). When applied to the dialects identification of the GCR, we f...
متن کاملBuilding resources for Algerian Arabic dialects
The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through this paper, and for the first time, to build parallel corpora for Algerian dialects, because our ultimate purpose is to achieve a Machine Translation (MT) for Mo...
متن کاملIntegrating sentence- and word-level error identification for disfluency correction
While speaking spontaneously, speakers often make errors such as self-correction or false starts which interfere with the successful application of natural language processing techniques like summarization and machine translation to this data. There is active work on reconstructing this errorful data into a clean and fluent transcript by identifying and removing these simple errors. Previous re...
متن کاملImproved Sentence-Level Arabic Dialect Classification
The paper presents work on improved sentence-level dialect classification of Egyptian Arabic (ARZ) vs. Modern Standard Arabic (MSA). Our approach is based on binary feature functions that can be implemented with a minimal amount of task-specific knowledge. We train a featurerich linear classifier based on a linear support-vector machine (linear SVM) approach. Our best system achieves an accurac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2018
ISSN: 1877-0509
DOI: 10.1016/j.procs.2018.10.484