Information Theoretic Retrieval with Structured Queries and Documents
نویسندگان
چکیده
Information retrieval through statistical language modeling has become popular thanks to its firm theoretical background and good retrieval performance. One goal of current research on structured information retrieval is thus to extend such models to take advantage of structure information. As a structure may be present on documents or queries or both, we are interested in supporting not only unstructured queries on structured documents, but also structured queries on unstructured documents as well as structured queries on structured documents. Most of research work has considered the first task, i.e., unstructured queries over structured docs, while some papers have addressed using structured or semistructured queries on unstructured docs. Here we take a unified approach. Our basic retrieval model is the well known Kullback-Leibler divergence, with backoff smoothing. In this paper we show how it can be extended to model and support structured/unstructured queries on structured/ unstructured documents. We make a very general assumption on the type of structure imposed on queries and/or documents, suitable for describing various structured data. We also study how the extended model can be efficiently computed. We finally report on our experiments at INEX 2006, in which we used a rough approximation of the presented model. A full implementation of the model and a more significant evaluation of its retrieval effectiveness are left for future work.
منابع مشابه
Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval
Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model. Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...
متن کاملComparing XML-IR Query Formation Interfaces
XML information retrieval (XML-IR) systems differ from traditional information retrieval systems by using structure of XML documents to retrieve more specific units of information than the documents themselves. Users interact with XML-IR systems via structured queries that express their content and structural requirements. Historically, it has been common belief within the XML-IR community that...
متن کاملEfficient preprocessing of XML queries using structured signatures
The paper proposes a preprocessing scheme for efficient processing of XML queries in XML-based information retrieval systems. For the preprocessing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based informati...
متن کاملSIREn: Entity Retrieval System for the Web of Data
We present ongoing work on the Semantic Information Retrieval Engine (SIREn), an “entity retrieval system” specifically designed to meet the requirements of indexing and searching a large amount of semi-structured data, e.g. the entire Web of Data. SIREn supports efficient full text search with semi-structural queries and exhibits a concise index, constant time updates and inherits Information ...
متن کاملXML Fragments Extended with Database Operators
XML documents represent a middle range between unstructured data such as textual documents and fully structured data encoded in databases. Typically, information retrieval techniques are used to support search on the “unstructured” end of this scale, while database techniques are used for the structured part. To date, most of the works on XML query and search have stemmed from the structured si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006