Sketching and streaming —
نویسنده
چکیده
Distinct elements (F0). In this note we will consider the distinct elements problem, also known as the F0 problem, defined as follows. We are given a stream of integers i1, . . . , im ∈ [n] where [n] denotes the set {1, 2, . . . , n}. We would like to output the number of distinct elements seen in the stream. As with Morris’ approximate counting algorithm, our goal will be to minimize our space consumption. There are two straightforward solutions as follows:
منابع مشابه
Sketching and streaming — Notes 3
Today the focus of the lecture will be on linear sketching. Up until this point, we have focused on streaming algorithms in the so-called insertion-only model. Specifically, consider the scenario of a vector x ∈ R with n large, and x starting as the 0 vector. Then we see a sequence of updates (i,∆) each causing the change xi ← xi +∆. You might imagine, for instance, that x has a coordinate for ...
متن کاملFROSH: FasteR Online Sketching Hashing
Many hashing methods, especially those that are in the data-dependent category with good learning accuracy, are still inefficient when dealing with three critical problems in modern data analysis. First, data usually come in a streaming fashion, but most of the existing hashing methods are batch-based models. Second, when data become huge, the extensive computational time, large space requireme...
متن کاملLinear Sketching over $\mathbb F_2$
We initiate a systematic study of linear sketching over F2. For a given Boolean function f : {0, 1}n → {0, 1} a randomized F2-sketch is a distributionM over d×nmatrices with elements over F2 such that Mx suffices for computing f(x) with high probability. We study a connection between F2-sketching and a two-player one-way communication game for the corresponding XOR-function. Our results show th...
متن کاملSketching and Streaming High-Dimensional Vectors
A sketch of a dataset is a small-space data structure supporting some prespecified set of queries (and possibly updates) while consuming space substantially sublinear in the space required to actually store all the data. Furthermore, it is often desirable, or required by the application, that the sketch itself be computable by a small-space algorithm given just one pass over the data, a so-call...
متن کاملFaster Anomaly Detection via Matrix Sketching
We present efficient streaming algorithms to compute two commonly used anomaly measures: the rank-k leverage scores (aka Mahalanobis distance) and the rank-k projection distance, in the rowstreaming model. We show that commonly used matrix sketching techniques such as the Frequent Directions sketch and random projections can be used to approximate these measures. Our main technical contribution...
متن کاملCo-Occurring Directions Sketching for Approximate Matrix Multiply
We introduce co-occurring directions sketching, a deterministic algorithm for approximate matrix product (AMM), in the streaming model. We show that co-occurring directions achieves a better error bound for AMM than other randomized and deterministic approaches for AMM. Co-occurring directions gives a (1 + ")-approximation of the optimal low rank approximation of a matrix product. Empirically o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016