نتایج جستجو برای: figures or written texts
تعداد نتایج: 3634289 فیلتر نتایج به سال:
The use of methods borrowed from statistics and physics has allowed for the discovery of unprecedent patterns of human behavior and cognition by establishing links between models features and language structure. While current models have been useful to identify patterns via analysis of syntactical and semantical networks, only a few works have probed the relevance of investigating the structure...
Algorithmic decipherment is a prime example of a truly unsupervised problem. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. We then present an approach to decoding ana...
Language Identification is the task of automatically identifying the language(s) in which the content is written in a document (web page, text document). Due to the widespread use of internet, identification of languages has become an important preprocessing step for a number of applications such as machine translation, Part-of-Speech tagging, linguistic corpus creation, supporting low-density ...
This paper explores differences between male and female writing in a large subset of the British National Corpus covering a range of genres. Several classes of simple lexical and syntactic features that differ substantially according to author gender are identified, both in fiction and in non-fiction documents. In particular, we find significant differences between maleand female-authored docum...
We study the frequency distributions and correlations of the word lengths of ten European languages. Our findings indicate that a) the word-length distribution of short words quantified by the mean value and the entropy distinguishes the Uralic (Finnish) corpus from the others, b) the tails at long words, manifested in the high-order moments of the distributions, differentiate the Germanic lang...
A bout 120 new Mesopotamian mathematical cuneiform texts, all from the Norwegian Schøyen Collection, are published in the author’s book A Remarkable Collection of Babylonian Mathematical Texts, Springer (2007). Most of the texts are Old Babylonian (1900–1600 BC), but some are older (Sumerian), or younger (Kassite). In addition to the presentation and discussion of these new texts, the book cont...
We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant re...
For language modeling of spontaneous speech recognition, we propose a style transformation approach, which transforms written texts to a spoken-style language model. Since these two styles are largely different and thus direct transformation is difficult, we cascade two transformation methods; rule-based transformation to rewrite written-style texts to intermediate “verbatim” texts, and statist...
In this case study we show how an unambiguous semantic representation can be constructed dynamically in left-to-right order while a text is written in PENG, a controlled natural language designed for knowledge representation. PENG can be used in contexts where precise texts (e.g. software specifications, axioms for formal ontologies, legal documents) need to be composed. Texts written in PENG l...
A statistical physics study of punctuation effects on sentence lengths is presented for written texts: Alice in wonderland and Through a looking glass. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید