Ordinal Constrained Binary Code Learning for Nearest Neighbor Search
نویسندگان
چکیده
Recent years have witnessed extensive attention in binary code learning, a.k.a. hashing, for nearest neighbor search problems. It has been seen that high-dimensional data points can be quantized into binary codes to give an efficient similarity approximation via Hamming distance. Among existing schemes, ranking-based hashing is recent promising that targets at preserving ordinal relations of ranking in the Hamming space to minimize retrieval loss. However, the size of the ranking tuples, which shows the ordinal relations, is quadratic or cubic to the size of training samples. By given a large-scale training data set, it is very expensive to embed such ranking tuples in binary code learning. Besides, it remains a dificulty to build ranking tuples efficiently for most ranking-preserving hashing, which are deployed over an ordinal graph-based setting. To handle these problems, we propose a novel ranking-preserving hashing method, dubbed Ordinal Constraint Hashing (OCH), which efficiently learns the optimal hashing functions with a graph-based approximation to embed the ordinal relations. The core idea is to reduce the size of ordinal graph with ordinal constraint projection, which preserves the ordinal relations through a small data set (such as clusters or random samples). In particular, to learn such hash functions effectively, we further relax the discrete constraints and design a specific stochastic gradient decent algorithm for optimization. Experimental results on three large-scale visual search benchmark datasets, i.e. LabelMe, Tiny100K and GIST1M, show that the proposed OCH method can achieve superior performance over the state-ofthe-arts approaches. Introduction Learning binary code, a.k.a. hashing, to preserve the data similarity has recently been popular in various computer vision and artificial intelligence applications, e.g., image retrieval (Liu et al. 2016), objective detection (Dean et al. 2013), multi-task learning (Weinberger et al. 2009), linear classifier training (Li et al. 2011; Lin et al. 2014), and active learning (Liu et al. 2012b). In this setting, real-valued data points are encoded into binary codes that are significantly efficient in storage and computation. In general, most hashing Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. *Corresponding Author. methods learn a set of hash functions h : R → {0, 1}. It typically maps the d-dimensional data space into an r-bit discrete Hamming space, such that the nearest neighbors can be approximated by using the compact binary codes learned. Recent advances in binary code learning can be categorized into either data-independent or data-dependent ones (Wang et al. 2016). The former typically refers to random projection/partition of feature space, such as Locality Sensitive Hashing (LSH) and Min-Hash (MinHash). It typically requires long bits or multi-hash table to achieve satisfied retrieval performance. Both supervised and unsupervised hashing belong to data-dependent hashing. Unsupervised hashing, learns hash functions by preserving the data structure, distribution, or topological information, e.g., Spectral Hashing (SH) (Weiss, Torralba, and Fergus 2008), Anchor Graph Hashing (AGH) (Liu et al. 2011), Isotropic Hashing (IsoHash) (Kong and Li 2012), Iterative Quantization (ITQ) (Gong et al. 2013), Discrete Graph Hashing (DGH) (Liu et al. 2014), Spherical Hashing (SpH) (Heo et al. 2015), Scalable Graph Hashing (SGH) (Jiang and Li 2015), and Ordinal Embedding Hashing (OEH) (Liu et al. 2016). Differently, supervised hashing aims to learn more accurate hash functions with label information. Representative works include, but not limited to, Binary Reconstructive Embedding (BRE) (Kulis and Darrell 2009), Minimal Loss Hashing (MLH) (Norouzi and Fleet 2011), Kernel-based Supervised Hashing (KSH) (Liu et al. 2012a), Semi-Supervised Hashing (SSH) (Wang, Kumar, and Chang 2012), Supervised Discrete Hashing (SDH) (Shen et al. 2015). Although promising performance has been shown from these methods, we argue that, the relative order among data must be preserved in the Hamming space rather than pairwise relations. So, many ranking-based hashing algorithms have been proposed to learn more discriminative hash codes, e.g., Hamming Distance Metric Learning (HDML) (Norouzi, Fleet, and Salakhutdinov 2012), Ranking-based Supervised Hashing (RSH) (Wang et al. 2013a), Structbased Hashing (StructHash) (Lin, Shen, and Wu 2014), TopRank Supervised Binary Coding (Top-RSBC) (Song et al. 2015). However, most of these methods adopt the stochastic gradient decreasing (SGD) optimization under triplet ordinal constraints, which needs massive iterations. On the other ar X iv :1 61 1. 06 36 2v 1 [ cs .C V ] 1 9 N ov 2 01 6
منابع مشابه
Comparing apples to apples in the evaluation of binary coding methods
We discuss methodological issues related to the evaluation of unsupervised binary code construction methods for nearest neighbor search. These issues have been widely ignored in literature. These coding methods attempt to preserve either Euclidean distance or angular (cosine) distance in the binary embedding space. We explain why when comparing a method whose goal is preserving cosine similarit...
متن کاملFactorized Binary Codes for Large-Scale Nearest Neighbor Search
Nearest neighbor search is a ubiquitous problem in computer vision. Given a previously unseen query point q ∈ Rd , we seek its closest matches in a database X ∈ Rn×d . One class of techniques for nearest neighbor search is hashing algorithms for constructing compact binary codes. Hashing algorithms transform the original data points into compact bit string signatures that require significantly ...
متن کاملConvolutional Neural Networks for Text Hashing
Hashing, as a popular approximate nearest neighbor search, has been widely used for large-scale similarity search. Recently, a spectrum of machine learning methods are utilized to learn similarity-preserving binary codes. However, most of them directly encode the explicit features, keywords, which fail to preserve the accurate semantic similarities in binary code beyond keyword matching, especi...
متن کاملDiscrete Graph Hashing
Hashing has emerged as a popular technique for fast nearest neighbor search in gigantic databases. In particular, learning based hashing has received considerable attention due to its appealing storage and search efficiency. However, the performance of most unsupervised learning based hashing methods deteriorates rapidly as the hash code length increases. We argue that the degraded performance ...
متن کاملFast nearest neighbor search of entropy-constrained vector quantization
Entropy-constrained vector quantization (ECVQ) offers substantially improved image quality over vector quantization (VQ) at the cost of additional encoding complexity. We extend results in the literature for fast nearest neighbor search of VQ to ECVQ. We use a new, easily computed distance that successfully eliminates most codewords from consideration.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017