Concept Based Representations as Complement of Bag of Words in Information Retrieval

نویسندگان

  • Maya Carrillo
  • Aurelio López-López
چکیده

Information Retrieval models, which do not represent texts merely as collections of the words they contain, but rather as collections of the concepts they contain through synonym sets or latent dimensions, are known as Bag-of-Concepts (BoC) representations. In this paper we use random indexing, which uses co-occurrence information among words to generate semantic context vectors and then represent the documents and queries as BoC. In addition, we use a novel representation, Holographic Reduced Representation, previously proposed in cognitive models, which can encode relations between words. We show that these representations can be successfully used in information retrieval, can associate terms, and when they are combined with the traditional vector space model, they improve effectiveness, in terms of mean average precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Contexts and Fisher Vectors for the ImageCLEF 2011 Photo Annotation Task

This paper describes the participation of UNICAEN/GREYC to the ImageCLEF 2011 photo annotation task. The proposed approach uses visual image features and binary annotations of concepts only. In this approach, the annotations are predicted by SVM classifiers trained separately for each concept. The classifiers take Bag-of-Words histograms and fisher vectors representations as inputs, both being ...

متن کامل

Phrase-Based Document Categorization

(Chapter in Springer book ”Current Challenges in Patent Information Retrieval”, to appear in May 2011) This paper takes a fresh look at an old idea in Information Retrieval: the use of linguistically extracted phrases as terms in the automatic categorization of documents, and in particular the pre-classification of patent applications. In Information Retrieval, until now there was found little ...

متن کامل

Palarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm

Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

A Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features

Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010