Information Theoretic Retrieval with Structured Queries and Documents

نویسندگان

Claudio Carpineto

Giovanni Romano

Caterina Caracciolo

چکیده

Information retrieval through statistical language modeling has become popular thanks to its firm theoretical background and good retrieval performance. One goal of current research on structured information retrieval is thus to extend such models to take advantage of structure information. As a structure may be present on documents or queries or both, we are interested in supporting not only unstructured queries on structured documents, but also structured queries on unstructured documents as well as structured queries on structured documents. Most of research work has considered the first task, i.e., unstructured queries over structured docs, while some papers have addressed using structured or semistructured queries on unstructured docs. Here we take a unified approach. Our basic retrieval model is the well known Kullback-Leibler divergence, with backoff smoothing. In this paper we show how it can be extended to model and support structured/unstructured queries on structured/ unstructured documents. We make a very general assumption on the type of structure imposed on queries and/or documents, suitable for describing various structured data. We also study how the extended model can be efficiently computed. We finally report on our experiments at INEX 2006, in which we used a rough approximation of the presented model. A full implementation of the model and a more significant evaluation of its retrieval effectiveness are left for future work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model. Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

Comparing XML-IR Query Formation Interfaces

XML information retrieval (XML-IR) systems differ from traditional information retrieval systems by using structure of XML documents to retrieve more specific units of information than the documents themselves. Users interact with XML-IR systems via structured queries that express their content and structural requirements. Historically, it has been common belief within the XML-IR community that...

متن کامل

Efficient preprocessing of XML queries using structured signatures

The paper proposes a preprocessing scheme for efficient processing of XML queries in XML-based information retrieval systems. For the preprocessing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based informati...

متن کامل

SIREn: Entity Retrieval System for the Web of Data

We present ongoing work on the Semantic Information Retrieval Engine (SIREn), an “entity retrieval system” specifically designed to meet the requirements of indexing and searching a large amount of semi-structured data, e.g. the entire Web of Data. SIREn supports efficient full text search with semi-structural queries and exhibits a concise index, constant time updates and inherits Information ...

متن کامل

XML Fragments Extended with Database Operators

XML documents represent a middle range between unstructured data such as textual documents and fully structured data encoded in databases. Typically, information retrieval techniques are used to support search on the “unstructured” end of this scale, while database techniques are used for the structured part. To date, most of the works on XML query and search have stemmed from the structured si...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Information Theoretic Retrieval with Structured Queries and Documents

نویسندگان

چکیده

منابع مشابه

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Comparing XML-IR Query Formation Interfaces

Efficient preprocessing of XML queries using structured signatures

SIREn: Entity Retrieval System for the Web of Data

XML Fragments Extended with Database Operators

عنوان ژورنال:

اشتراک گذاری