Statistical mechanics of letters in words.

نویسندگان

  • Greg J Stephens
  • William Bialek
چکیده

We consider words as a network of interacting letters, and approximate the probability distribution of states taken on by this network. Despite the intuition that the rules of English spelling are highly combinatorial and arbitrary, we find that maximum entropy models consistent with pairwise correlations among letters provide a surprisingly good approximation to the full statistics of words, capturing ∼92% of the multi-information in four-letter words and even "discovering" words that were not represented in the data. These maximum entropy models incorporate letter interactions through a set of pairwise potentials and thus define an energy landscape on the space of possible words. Guided by the large letter redundancy we seek a lower-dimensional encoding of the letter distribution and show that distinctions between local minima in the landscape account for ∼68% of the four-letter entropy. We suggest that these states provide an effective vocabulary which is matched to the frequency of word use and much smaller than the full lexicon.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward a statistical mechanics of four letter words

We consider words as a network of interacting letters, and approximate the probability distribution of states taken on by this network. Despite the intuition that the rules of English spelling are highly combinatorial (and arbitrary), we find that maximum entropy models consistent with pairwise correlations among letters provide a surprisingly good approximation to the full statistics of four l...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

Impact of Genre-Based Instruction on Development of Students’ Letter Writing Skills: The Case of Students of Textile Engineering

The current study investigated the effectiveness of genre-based instruction on the development of EFL learners’ writing skills. Participants were 34 undergraduate students majoring in textile engineering at an Iranian state university, and they had enrolled in the English for specific academic purposes course. Participants were taught how to write 4 types of business letters, highlighting the p...

متن کامل

Statistical mechanics of a discrete nonlinear system

Statistical mechanics of the discrete nonlinear Schrodinger equation is studied by means of analytical and numerical techniques. The lower bound of the Hamiltonian permits the construction of standard Gibbsian equilibrium measures for positive temperatures. Beyond the line of T = infinity, we identify a phase transition through a discontinuity in the partition function. The phase transition is ...

متن کامل

Uncertainty, entropy, and the statistical mechanics of microscopic systems.

Noting that quantum measurements are in general incomplete, we develop, starting from a recent entropic formulation of uncertainty, a mazimum uncertainty principle to define the statistical mechanics of microscopic systems. The resulting ensemble entropy coincides with the expression of von Neumann, thus providing a unified, quantum basis for statistical physics of all systems. Examples involvi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Physical review. E, Statistical, nonlinear, and soft matter physics

دوره 81 6 Pt 2  شماره 

صفحات  -

تاریخ انتشار 2010