Weighted sampling without replacement from data streams
نویسندگان
چکیده
منابع مشابه
Weighted Sampling Without Replacement from Data Streams
Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis (IPL 2006) presented an algorithm for weighted sampling without replacement from data streams. Their algorithm works under the assumption of precise computations over the interval [0, 1]. Cohen and Kaplan (VLDB 2008) used similar methods for their bottom-k sketches. ...
متن کاملAccelerating weighted random sampling without replacement
Random sampling from discrete populations is one of the basic primitives in statistical computing. This article briefly introduces weighted and unweighted sampling with and without replacement. The case of weighted sampling without replacement appears to be most difficult to implement efficiently, which might be one reason why the R implementation performs slowly for large problem sizes. This p...
متن کاملWeighted Random Sampling over Data Streams
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2,4]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
متن کاملEdgeworth Expansions for Sampling without Replacement from Finite Populations
The validity of the one-term Edgeworth expansion is proved for the multivariate mean of a random sample drawn without replacement under a limiting non-latticeness condition on the population. The theorem is applied to deduce the oneterm expansion for the univariate statistics which can be expressed in a certain linear plus quadratic form. An application of the results to the theory of bootstrap...
متن کاملMin-wise independent sampling from skewed data streams
Min-wise independent hashing is a powerful sampling technique for estimating the similarity between sets. In particular, it has proved to be ubiquitous for mining data streams of large volume where the input sets are revealed in arbitrary order and the elements in a given set do not arrive consecutively. More precisely, for sets of elements E and attributes A the input is a stream of element-at...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Processing Letters
سال: 2015
ISSN: 0020-0190
DOI: 10.1016/j.ipl.2015.07.007