Language Code Switching in Web Corpora

نویسنده

  • Vladimír Benko
چکیده

One of the challenges in building and using web corpora is their rather high content of “noise”, most notably having the form of foreign-language text fragments within otherwise monolingual text. Our paper presents an approach trying to cope with this problem by means of “exhaustive” stop-word lists provided by morphosyntactic taggers. As a side effect of the procedure, a problem of tagging text with missing diacritics is also addressed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Perceptions of Language Learners across Various Proficiency Levels of Teachers’ Code-switching

Code-switching (CS), an alternation between two or more languages or language varieties, has long been researched in language education. A great number of studies by applied linguists have explored the reasons for, and the potential usages of code-switching in foreign language education over the past years. This study explores the perceptions of English language learners across various proficie...

متن کامل

Mixed Language and Code-Switching in the Canadian Hansard

While there has been lots of interest in code-switching in informal text such as tweets and online content, we ask whether code-switching occurs in the proceedings of multilingual institutions. We focus on the Canadian Hansard, and automatically detect mixed language segments based on simple corpus-based rules and an existing word-level language tagger. Manual evaluation shows that the performa...

متن کامل

Motivational Determinants of Code-Switching in Iranian EFL Classrooms

“Code-Switching”, an important issue in the field of both language classroom and sociolinguistics, has been under consideration in investigations related to bilingual and multilingual societies. First proposed by Haugen (1956) and later developed byGrosjean (1982), the termcode-switching refers to language alternation during communication. Although code-switching is unavoidable in bilingual and...

متن کامل

Functions of Code-Switching Strategies among Iranian EFL Learners and Their Speaking Ability Improvement through Code-Switching

This study investigated the impact of code-switching on speaking ability of Iranian low proficiency EFL learners. Moreover, it was an attempt to show what functions existed behind code-switching strategies used by the EFL learners. To this end, 60 male and female Iranian EFL learners age-ranged between 20 and 30 participated in the study. Data collection instruments which were used were the Int...

متن کامل

The effect of Code switching on the Acquisition of Object Relative Clauses by Iranian EFL Learners

This study attempted to investigate the impact of teacher’s code-switching on the acquisition of a problematic grammatical structure, namely, object relative clauses, by intermediate EFL learners. Moreover, a secondary objective of the study was to determine the EFL learners’ attitudes and opinions regarding the effectiveness of teacher’s code-switching in their learning of a specific aspect of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017