Towards an Enhanced Vector Model to Encode Textual Relations: Experiments Retrieving Information

نویسندگان

  • Maya Carrillo
  • Aurelio López-López
چکیده

The constant growth of digital information, facilitated by storage technologies, imposes new challenges for information processing tasks, and maintains the need of effective search mechanisms, oriented towards improving in precision but simultaneously capable of producing useful information in a short time. Hence, this paper presents a document representation to encode textual relations. This representation does not consider each term as one entry in a vector but rather as a pattern, i.e. a set of contiguous entries. To deal with variations inherent in natural language, we plan to express textual relations (such as noun phrases, named entities, subject-verb, verb-object, adjective-noun, and adverb-verb) as composed patterns. An operator is applied to form bindings between terms encoding relations as new “terms”, thereby providing additional descriptive elements for indexing a document collection. The results of our first experiments, using the document representation to conduct information retrieval and incorporating two-word noun phrases, showed that the representation is feasible, retrieves, and improves the ranking of relevant documents, and consequently the values of mean average precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Knowledge Graph Representation with Jointly Structural and Textual Encoding

The objective of knowledge graph embedding is to encode both entities and relations of knowledge graphs into continuous low-dimensional vector spaces. Previously, most works focused on symbolic representation of knowledge graph with structure information, which can not handle new entities or entities with few facts well. In this paper, we propose a novel deep architecture to utilize both struct...

متن کامل

Combining Text Vector Representations for Information Retrieval

This paper suggests a novel representation for documents that is intended to improve precision. This representation is generated by combining two central techniques: Random Indexing; and Holographic Reduced Representations (HRRs). Random indexing uses co-occurrence information among words to generate semantic context vectors that are the sum of randomly generated term identity vectors. HRRs are...

متن کامل

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

Implicit Discourse Relation Recognition with Context-aware Character-enhanced Embeddings

For the task of implicit discourse relation recognition, traditional models utilizing manual features can suffer from data sparsity problem. Neural models provide a solution with distributed representations, which could encode the latent semantic information, and are suitable for recognizing semantic relations between argument pairs. However, conventional vector representations usually adopt em...

متن کامل

Towards Retrieving and Ranking Clinical Recommendations with Cross-Lingual Random Indexing

Clinicians have to deal with large amounts of textual data, searching for and navigating information that satisfies their informational needs. Clinical notes and Clinical Practice Guidelines (CPG) are textual resources that usually contain free text. As language use in the medical domain is rather specialized, generic information retrieval tools are suboptimal for such data. This calls for spec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008