ReaderBench: Multilevel analysis of Russian text characteristics
نویسندگان
چکیده
This paper introduces an adaptation of the open source ReaderBench framework that now supports Russian multilevel analyses text characteristics, while integrating both textual complexity indices and state-of-the-art language models, namely Bidirectional Encoder Representations from Transformers (BERT). The evaluation proposed processing pipeline was conducted on a dataset containing texts two levels for foreign learners (A - Basic user B Independent user). Our experiments showed are statistically significant in differentiating between classes level, from: a) statistical perspective, where Kruskal-Wallis analysis performed features such as “nmod” dependency tag or number nouns at sentence level proved be most predictive; b) neural network our model combining contextualized embeddings obtained accuracy 92.36% leave one out cross-validation, outperforming BERT baseline. can employed by designers developers educational materials to evaluate rank based their difficulty, well larger audience assessing different domains, including law, science, politics.
منابع مشابه
ReaderBench, an Environment for Analyzing Text Complexity and Reading Strategies
ReaderBench is a multi-purpose, multi-lingual and flexible environment that enables the assessment of a wide range of learners’ productions and their manipulation by the teacher. ReaderBench allows the assessment of three main textual features: cohesion-based assessment, reading strategies identification and textual complexity evaluation, which have been subject to empirical validations. Reader...
متن کاملManual for postediting Russian text
The present study is a practical guide to editors who refine partially machine-translated text as a basis for linguistic analysis. The posteditors' tasks are: to code preferred English equivalents, to code English structural symbols, to resolve grammatic properties, and to code syntactic connections (dependencies). A general introduction to the field of machine translation is contained in The R...
متن کاملBell laboratories Russian text-to-speech system
This paper describes the Bell Labs Russian text-to-speech system, a concatenative system with extensive text-analysis capabilities. The construction of Russian-specific modules will be discussed, including the text-analysis module, the acoustic inventory, the duration module, and the intonation module.
متن کاملAcoustic characteristics of surprise in Russian questions
This paper reports the results of an experimental phonetic study investigating the production of neutral and surprised interrogatives in Russian. The paper describes F0 and duration parameters of 22 one-word three-syllable ‘yes/no’ questions with the lexical stress on the penultimate syllable pronounced by five speakers of Russian. The speakerindependent differences between the parameters of ne...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Russian journal of linguistics
سال: 2022
ISSN: ['2312-9182', '2312-9212']
DOI: https://doi.org/10.22363/2687-0088-30145