Practical linear-space Approximate Near Neighbors in high dimension
نویسندگان
چکیده
The c-approximate Near Neighbor problem in high dimensional spaces has been mainly addressed by Locality Sensitive Hashing (LSH), which offers polynomial dependence on the dimension, query time sublinear in the size of the dataset, and subquadratic space requirement. For practical applications, linear space is typically imperative. Most previous work in the linear space regime focuses on the case that c exceeds 1 by a constant term. In a recently accepted paper, optimal bounds have been achieved for any c > 1 [ALRW17]. Towards practicality, we present a new and simple data structure using linear space and sublinear query time for any c > 1 including c → 1. Given an LSH family of functions for some metric space, we randomly project points to the Hamming cube of dimension logn, where n is the number of input points. The projected space contains strings which serve as keys for buckets containing the input points. The query algorithm simply projects the query point, then examines points which are assigned to the same or nearby vertices on the Hamming cube. We analyze in detail the query time for some standard LSH families. To illustrate our claim of practicality, we offer an open-source implementation in C++, and report on several experiments in dimension up to 1000 and n up to 10. Our algorithm is one to two orders of magnitude faster than brute force search. Experiments confirm the sublinear dependence on n and the linear dependence on the dimension. We have compared against stateof-the-art LSH-based library FALCONN: our search is somewhat slower, but memory usage and preprocessing time are significantly smaller.
منابع مشابه
Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor
The approximate nearest neighbor problem ( -ANN) in Euclidean settings is a fundamental question, which has been addressed by two main approaches: Data-dependent space partitioning techniques perform well when the dimension is relatively low, but are affected by the curse of dimensionality. On the other hand, locality sensitive hashing has polynomial dependence in the dimension, sublinear query...
متن کاملQuantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning
We quantitatively analyze the performance of exact and approximate nearest-neighbors algorithms on increasingly high-dimensional problems in the context of sampling-based motion planning. We study the impact of the dimension, number of samples, distance metrics, and sampling schemes on the efficiency and accuracy of nearest-neighbors algorithms. Efficiency measures computation time and accuracy...
متن کاملGraph-based time-space trade-offs for approximate near neighbors
We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size n = 2o(d) on the d-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approx...
متن کاملLocal Doubling Dimension of Point Sets
We introduce the notion of t-restricted doubling dimension of a point set in Euclidean space as the local intrinsic dimension up to scale t. In many applications information is only relevant for a fixed range of scales. We present an algorithm to construct a hierarchical net-tree up to scale t which we denote as the net-forest. We present a method based on Locality Sensitive Hashing to compute ...
متن کاملApproximate line nearest neighbor in high dimensions
We consider the problem of approximate nearest neighbors in high dimensions, when the queries are lines. In this problem, given n points in R, we want to construct a data structure to support efficiently the following queries: given a line L, report the point p closest to L. This problem generalizes the more familiar nearest neighbor problem. From a practical perspective, lines, and low-dimensi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1612.07405 شماره
صفحات -
تاریخ انتشار 2016