Guessing Probability Distributions from Small Samples
نویسنده
چکیده
We propose a new method for the calculation of the statistical properties, as e.g. the entropy, of unknown generators of symbolic sequences. The probability distribution p(k) of the elements k of a population can be approximated by the frequencies f(k) of a sample provided the sample is long enough so that each element k occurs many times. Our method yields an approximation if this precondition does not hold. For a given f(k) we recalculate the Zipf{ordered probability distribution by optimization of the parameters of a guessed distribution. We demonstrate that our method yields reliable results.
منابع مشابه
Asymptotic Coupling and Its Applications in Information Theory
A coupling of two distributions PX and PY is a joint distribution PXY with marginal distributions equal to PX and PY . Given marginals PX and PY and a real-valued function f(PXY ) of the joint distribution PXY , what is its minimum over all couplings PXY of PX and PY ? We study the asymptotics of such coupling problems with different f ’s. These include the maximal coupling, minimum distance co...
متن کاملIntroducing of Dirichlet process prior in the Nonparametric Bayesian models frame work
Statistical models are utilized to learn about the mechanism that the data are generating from it. Often it is assumed that the random variables y_i,i=1,…,n ,are samples from the probability distribution F which is belong to a parametric distributions class. However, in practice, a parametric model may be inappropriate to describe the data. In this settings, the parametric assumption could be r...
متن کاملExact Probability Distribution versus Entropy
The problem addressed concerns the determination of the average number of successive attempts of guessing a word of a certain length consisting of letters with given probabilities of occurrence. Both firstand second-order approximations to a natural language are considered. The guessing strategy used is guessing words in decreasing order of probability. When word and alphabet sizes are large, a...
متن کاملImproved Models for Password Guessing
One approach to measuring password strength is to assess the probability it will be cracked in a fixed set of guesses. The current state of the art in password guessing employs a first-order Markov model that makes several assumptions about the distribution of passwords. We present two novel approaches to modeling password distributions that remove some of these assumptions. First, a layered Ma...
متن کاملThe Impact of Correction for Guessing Formula on MC and Yes/No Vocabulary Tests' Scores
A standard correction for random guessing (cfg) formula on multiple-choice and Yes/Noexaminations was examined retrospectively in the scores of the intermediate female EFL learners in an English language school. The correctionwas a weighting formula for points awarded for correct answers,incorrect answers, and unanswered questions so that the expectedvalue of the increase in test score due to g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995