Locality-Sensitive Hashing for Finding Nearest Neighbors
نویسندگان
چکیده
1053-5888/08/$20.00©2008IEEE IEEE SIGNAL PROCESSING MAGAZINE [128] MARCH 2008 T he Internet has brought us a wealth of data, all now available at our fingertips. We can easily carry in our pockets thousands of songs, hundreds of thousands of images, and hundreds of hours of video. But even with the rapid growth of computer performance, we don’t have the processing power to search this amount of data by brute force. This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to quickly find similar entries in large databases. This approach belongs to a novel and interesting class of algorithms that are known as randomized algorithms. A randomized algorithm does not guarantee an exact answer but instead provides a high probability guarantee that it will return the correct answer or one close to it. By investing additional computational effort, the probability can be pushed as high as desired.
منابع مشابه
Object Recognition Using Locality-Sensitive Hashing of Shape Contexts
At the core of many computer vision algorithms lies the task of finding a correspondence between image features local to a part of an image. Once these features are calculated, matching is commonly performed using a nearest-neighbor algorithm. In this chapter, we focus on the topic of object recognition, and examine how the complexity of a basic feature-matching approach grows with the number o...
متن کاملReverse Nearest Neighbors Search in High Dimensions using Locality-Sensitive Hashing
We investigate the problem of finding reverse nearest neighbors efficiently. Although provably good solutions exist for this problem in low or fixed dimensions, to this date the methods proposed in high dimensions are mostly heuristic. We introduce a method that is both provably correct and efficient in all dimensions, based on a reduction of the problem to one instance of εnearest neighbor sea...
متن کاملNearest Neighbors with Learned Distances for Phonetic Frame Classification
Nearest neighbor-based techniques provide an approach to acoustic modeling that avoids the often lengthy and heuristic process of training traditional Gaussian mixturebased models. Here we study the problem of choosing the distance metric for a k-nearest neighbor (k-NN) phonetic frame classifier. We compare the standard Euclidean distance to two learned Mahalanobis distances, based on large-mar...
متن کاملRankReduce - Processing K-Nearest Neighbor Queries on Top of MapReduce
We consider the problem of processing K-Nearest Neighbor (KNN) queries over large datasets where the index is jointly maintained by a set of machines in a computing cluster. The proposed RankReduce approach uses locality sensitive hashing (LSH) together with a MapReduce implementation, which by design is a perfect match as the hashing principle of LSH can be smoothly integrated in the mapping p...
متن کاملLSH At Large - Distributed KNN Search in High Dimensions
We consider K-Nearest Neighbor search for high dimensional data in large-scale structured Peer-to-Peer networks. We present an efficient mapping scheme based on p-stable Locality Sensitive Hashing to assign hash buckets to peers in a Chord-style overlay network. To minimize network traffic, we process queries in an incremental top-K fashion leveraging on a locality preserving mapping to the pee...
متن کاملFast Approximate Nearest Neighbor Methods for Non-Euclidean Manifolds with Applications to Human Activity Analysis in Videos
Approximate Nearest Neighbor (ANN) methods such as Locality Sensitive Hashing, Semantic Hashing, and Spectral Hashing, provide computationally efficient procedures for finding objects similar to a query object in large datasets. These methods have been successfully applied to search web-scale datasets that can contain millions of images. Unfortunately, the key assumption in these procedures is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008