Divergence and Shannon information in genomes.

نویسندگان

  • Hong-Da Chen
  • Chang-Heng Chang
  • Li-Ching Hsieh
  • Hoong-Chien Lee
چکیده

Shannon information (SI) and its special case, divergence, are defined for a DNA sequence in terms of probabilities of chemical words in the sequence and are computed for a set of complete genomes highly diverse in length and composition. We find the following: SI (but not divergence) is inversely proportional to sequence length for a random sequence but is length independent for genomes; the genomic SI is always greater and, for shorter words and longer sequences, hundreds to thousands times greater than the SI in a random sequence whose length and composition match those of the genome; genomic SIs appear to have word-length dependent universal values. The universality is inferred to be an evolution footprint of a universal mode for genome growth.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Goodness of Fit Test For Exponentiality Based on Lin-Wong Information

In this paper, we introduce a goodness of fit test for expo- nentiality based on Lin-Wong divergence measure. In order to estimate the divergence, we use a method similar to Vasicek’s method for estimat- ing the Shannon entropy. The critical values and the powers of the test are computed by Monte Carlo simulation. It is shown that the proposed test are competitive with other tests of exponentia...

متن کامل

Jensen divergence based on Fisher's information

The measure of Jensen-Fisher divergence between probability distributions is introduced and its theoretical grounds set up. This quantity, in contrast to the remaining Jensen divergences, is very sensitive to the fluctuations of the probability distributions because it is controlled by the (local) Fisher information, which is a gradient functional of the distribution. So, it is appropriate and ...

متن کامل

A family of statistical symmetric divergences based on Jensen's inequality

We introduce a novel parametric family of symmetric information-theoretic distances based on Jensen’s inequality for a convex functional generator. In particular, this family unifies the celebrated Jeffreys divergence with the Jensen-Shannon divergence when the Shannon entropy generator is chosen. We then design a generic algorithm to compute the unique centroid defined as the minimum average d...

متن کامل

Bounds on Non-Symmetric Divergence Measures in Terms of Symmetric Divergence Measures

There are many information and divergence measures exist in the literature on information theory and statistics. The most famous among them are Kullback-Leibler [13] relative information and Jeffreys [12] Jdivergence. Sibson [17] Jensen-Shannon divergence has also found its applications in the literature. The author [20] studied a new divergence measures based on arithmetic and geometric means....

متن کامل

Universal Lengths in Microbial Genomes and Implication for Early Genome Growth

We report the discovery of a set of universal lengths that characterize all microbial complete genomes. The Shannon information [Shannon 1948] of 108 complete microbial genomes relative to those of their respective randomized counterparts are computed and the results are summarized in a two-parameter exponential relation: Lr(k) = (42± 21)× 2.64, 2 ≥ k ≥ 10, where Lr is a ”root-sequence length” ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Physical review letters

دوره 94 17  شماره 

صفحات  -

تاریخ انتشار 2005