This formula can be informally read as follows: the ith messagemi brings us log(1=pi) "bits of information" (whatever this means), and appears with frequency pi, so H is expected amount information provided by one random message (one sample variable). Moreover, we construct an optimal uniquely decodable code that requires about (at most + 1, to exact) bits per on average, it encodes approximate...