Similarity join size estimation using locality sensitive hashing
نویسندگان
چکیده
منابع مشابه
Similarity Join Size Estimation using Locality Sensitive Hashing
Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generalization of the previously studied set similarity join size estimation (SSJ) problem and can handle more interesting cases such as TF-IDF vectors. One of the key challenges in similarity join size estimation is that the ...
متن کاملSimilarity Search and Locality Sensitive Hashing using TCAMs
Similarity search methods are widely used as kernels in various data mining and machine learning applications including those in computational biology, web search/clustering. Nearest neighbor search (NNS) algorithms are often used to retrieve similar entries, given a query. While there exist efficient techniques for exact query lookup using hashing, similarity search using exact nearest neighbo...
متن کاملBeyond Locality-Sensitive Hashing
We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R, our algorithm achieves Oc(n + d logn) query time and Oc(n + d logn) space, where ρ ≤ 7/(8c2) + O(1/c3) + oc(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower boun...
متن کاملFractal Image Compression Self-Similarity via Locality Sensitive Hashing
In this paper I describe a Haskell implementation of fractal image compression, a lossy image compression technique that leverages self-similarity within an image to produce an encoding. Known for its lengthy encoding time, fractal image encoding implementations require the most cleverness in identifying highly self-similar image regions. In this paper, I describe a simple locality sensitive ha...
متن کاملBayesian Locality Sensitive Hashing for Fast Similarity Search
Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-sensitive hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods only use LSH for the first phase of similarity search i.e. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the VLDB Endowment
سال: 2011
ISSN: 2150-8097
DOI: 10.14778/1978665.1978666