Systematic Interrelations Between Grapheme Frequencies and Word Length: Empirical Evidence from Slovene
نویسنده
چکیده
This paper focuses on the question whether grapheme frequencies are in a direct relationship to word length. In other words, a possible interrelation between the frequency of graphemes and the length of linguistic units is discussed. Based on different Slovene text types it is shown that the Altmann-Menzerath law is an adequate theoretical explanation for the supposed interrelation between grapheme frequencies and the word length. Furthermore a linguistic interpretation of parameters of grapheme frequency models is offered.
منابع مشابه
Towards a General Model of Grapheme Frequencies for Slavic Languages
The present study discusses a possible theoretical model for grapheme frequencies of Slavic alphabets. Based on previous research on Slovene, Russian, and Slovak grapheme frequencies, the negative hypergeometric distribution is presented as a model, adequate for various Slavic languages. Additionally, arguments are provided in favor of the assumption that the parameters of this model can be int...
متن کاملTowards automatic speech recognition without pronunciation dictionary, transcribed speech and text resources in the target language using cross-lingual word-to-phoneme alignment
In this paper we tackle the task of bootstrapping an Automatic Speech Recognition system without an a priori given language model, a pronunciation dictionary, or transcribed speech data for the target language Slovene – only untranscribed speech and translations to other resource-rich source languages of what was said are available. Therefore, our approach is highly relevant for under-resourced...
متن کاملThe grapho-phonological system of written French: Statistical analysis and empirical validation
The processes through which readers evoke mental representations of phonological forms from print constitute a hotly debated and controversial issue in current psycholinguistics. In this paper we present a computational analysis of the grapho-phonological system of written French, and an empirical validation of some of the obtained descriptive statistics. The results provide direct evidence dem...
متن کاملParameter interpretation of the Menzerath law: evidence from Serbian
The law-like relation between word and syllable length as part of the Menzerath law has been corrobated empirically in many different languages. As to South Slavic languages, we have the studies by Gajić (1950) and Grzybek (1999) on Croatian, and by Grzybek (2000) on Slovene. The aim of the present paper is first of all to provide empirical evidence of the Menzerath law for another South Slavic...
متن کاملEvaluating the Noisy Channel Model for the Normalization of Historical Texts: Basque, Spanish and Slovene
This paper presents a method for the normalization of historical texts using a combination of weighted finite-state transducers and language models. We have extended our previous work on the normalization of dialectal texts and tested the method against a 17th century literary work in Basque. This preprocessed corpus is made available in the LREC repository. The performance of this (semi-)super...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Quantitative Linguistics
دوره 19 شماره
صفحات -
تاریخ انتشار 2012