Data deduplication saves storage space by identifying and removing repeats in the data stream. Compared with traditional compression methods, schemes are more time efficient thus widely used large scale systems. In this paper, we provide an information-theoretic analysis on performance of algorithms streams which not exact. We introduce a source model probabilistic substitutions considered. Mor...