In the previous lecture, we showed that Shannon constructed a code, which was a one-to-one mapping, that took a stream of data X = (X1, ..., Xn) generated iid from a distribution P (X) over a finite alphabet A = (a1, ..., aA) of size A, and compressed it using ≈ nH(X) bits in total or ≈ H(X) bits per symbol, on average (for sufficiently large n). The code was based considering a special subset ...