Estimating Entropy and Entropy Norm on Data Streams
نویسندگان
چکیده
We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the “entropy norm” of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm under certain conditions on the input stream. We also prove a lower bound that rules out such an algorithm if these conditions do not hold. Our second group of results are for estimating the empirical entropy of an input stream. We first present a sublinear space one-pass algorithm for this problem. For a stream of m items and a given real parameter α, our algorithm uses space e O(m) and provides an approximation of 1/α in the worst case and (1 + ε) in “most” cases. We then present a two-pass polylogarithmic space (1+ ε)-approximation algorithm. All our algorithms are quite simple.
منابع مشابه
Estimating Entropy and Entropy Norm on Data Streams by Amit
We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the “entropy norm” of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm ...
متن کاملEstimating Entropy of Data Streams Using Compressed Counting
The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Rényi entropy or Tsallis entropy, which are both functions of the αth frequency moments and approach Shannon entropy a...
متن کاملEstimating Entropy over Data Streams
We present an algorithm for estimating entropy of data streams consisting of insertion and deletion operations using Õ(1) space.
متن کاملA Very Efficient Scheme for Estimating Entropy of Data Streams Using Compressed Counting
Compressed Counting (CC) was recently proposed for approximating the αth frequency moments of data streams, for 0 < α ≤ 2. Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on symmetric stable random projections, especially as α → 1. A direct application of CC is to estimate the entropy, which is an important summary statistic in Web/network measure...
متن کاملEntropy Estimations Using Correlated Symmetric Stable Random Projections
Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Internet Mathematics
دوره 3 شماره
صفحات -
تاریخ انتشار 2006