Query Processing for Retrieval from Large Text Bases

نویسندگان

  • John Broglio
  • W. Bruce Croft
چکیده

Natural language experiments in information retrieval have often been inconclusive due to the lack of large text bases with associated queries and relevance judgments. This paper describes experiments in incremental query processing and indexing with the INQUERY information retrieval system on the TIPSTER queries and document collection. The results measure the value of processing tailored for different query styles, use of syntactic tags to produce search phrases, recognition and application of generic concepts, and automatic concept extraction based on interword associations in a large t e x t base . 1. I N T R O D U C T I O N : T I P S T E R A N D INQUERY Previous research has suggested that retrieval effectiveness might be enhanced by the use of multiple representations and by au tomated language processing techniques. Techniques include automat ic or interactive introduction of synonyms [Har88], forms-based interfaces [CD90], automatic recognition of phrases [CTLgl], and relevance feedback [SB90]. The recent development of the T I P S T E R corpus with associated queries and relevance judgments has provided new opportunities for judging the effectiveness of these techniques on large heterogenous document collections. 1.1. T I P S T E R Text Base and Query Topics The T I P S T E R documents comprise two volumes of text, of approximately one gigabyte each, from sources such as newspaper and magazine articles and government publications (Federal Register). Accompanying the collections are two sets of fifty topics. Each topic is a full text description, in a specific format, of an information need. (Figure 1). Each T I P S T E R topic offers several representations of the same information need. The Topic and Description fields are similar to what might be entered as a query in a traditional information retrieval system. The Narrative field expands on the information need, giving an overview of the classes of documents which would or < t o p > < d o m > Domain: International Economics Topic: Satellite Launch Contracts Description: Document will cite the signing of a contract or preliminary agreement, or the making of a tentative reservation, to launch a commercial satellite. < n a r r > Narrative: A relevant document will mention the signing of a contract or preliminary agreement, or the making of a tentative reservation, to launch a commerciM satellite. Concept(s): 1. contract, agreement 2. launch vehicle, rocket, payload, satellite 3. launch services, commercial space industry, commercial launch industry 4. Arianespace, Martin Marietta, General Dynamics, McDonnell Douglas 5. Titan, Delta II, Atlas, Ariane, Proton

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

Position paper for W3C query language workshop

z Text retrieval (e.g. PAT) identify documents in large document bases z Addressing documents (e.g. Xlink) have flexible link models z Document layout (e.g. XSL) transform documents for layout z Document transformation (e.g. tree regular grammars) transform documents from one DTD to the other z Databases (e.g. OQL) map documents into data structures to apply DB techniques z Knowledge bases (e.g...

متن کامل

ارائه یک روش جدید بازیابی اطلاعات مناسب برای متون حاصل از بازشناسی گفتار

In this article a pre-processing method is introduced which is applicable in speech recognized texts retrieval task. We have a text corpus, t generated from a speech recognition system and a query as inputs,  to search queries in these documents and find relevant documents. A basic problem in a typical speech recognized text is some error percentage in recognition. This, results erroneously ass...

متن کامل

Hypermedia and Free Text Retrieval

This paper discusses aspects of multimedia document bases and how access to documents held on a computer-based system can be achieved; in particular the current access methods of hypermedia and free text information retrieval are discussed. Browsing-based hypermedia systems provide ease of use for novice users and equal access to any media; however, they typically perform poorly with very large...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993