Laplacian Co-hashing of Terms and Documents

نویسندگان

  • Dell Zhang
  • Jun Wang
  • Deng Cai
  • Jinsong Lu
چکیده

A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes within a short Hamming distance. In this paper, we introduce the novel problem of co-hashing where both documents and terms are hashed simultaneously according to their semantic similarities. Furthermore, we propose a novel algorithm Laplacian Co-Hashing (LCH) to solve this problem which directly optimises the Hamming distance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical Applications of Locality Sensitive Hashing for Unstructured Data

Working with large amounts of unstructured data (e.g., text documents) has become important for many business, engineering and scientific applications. The purpose of this article is to demonstrate how the practical Data Scientist can implement a Locality Sensitive Hashing system from start to finish in order to drastically reduce the time required to perform a similarity search in high dimensi...

متن کامل

More inequalities for Laplacian indices by way of majorization

The n-tuple of Laplacian characteristic values of a graph is majorized by the conjugate sequence of its degrees. Using that result we find a collection of general inequalities for a number of Laplacian indices expressed in terms of the conjugate degrees, and then with a maximality argument, we find tight general bounds expressed in terms of the size of the vertex set n and the average degree dG...

متن کامل

Co-authorship network analysis and social network indicators of coronavirus research

Background and aim: The aim of this study was to examine the status of documents related to coronavirus based on scientometric indicators and to draw a co-authorship map of authors, organizations and countries producing an article to get to know this field as much as possible. Materials and methods: This applied-scientometric was conducted using social network analysis. The statistical populati...

متن کامل

Image authentication using LBP-based perceptual image hashing

Feature extraction is a main step in all perceptual image hashing schemes in which robust features will led to better results in perceptual robustness. Simplicity, discriminative power, computational efficiency and robustness to illumination changes are counted as distinguished properties of Local Binary Pattern features. In this paper, we investigate the use of local binary patterns for percep...

متن کامل

Drawing Co-Citation Networks of Corona Virus Studies

Background and Aim: The purpose of the present study is to map the coronavirus domain citation network to better understand this domain based on all other citation networks.  Materials and Methods: The present study is applied in terms of purpose, and is descriptive scientometrics in terms of type, which has been done with the all-citation method. In this study, all scientific publications on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010