Analyzing Document Retrievability in Patent Retrieval Settings
نویسندگان
چکیده
Most information retrieval settings, such as web search, are typically precision-oriented, i.e. they focus on retrieving a small number of highly relevant documents. However, in specific domains, such as patent retrieval or law, recall becomes more relevant than precision: in these cases the goal is to find all relevant documents, requiring algorithms to be tuned more towards recall at the cost of precision. This raises important questions with respect to retrievability and search engine bias: depending on how the similarity between a query and documents is measured, certain documents may be more or less retrievable in certain systems, up to some documents not being retrievable at all within common threshold settings. Biases may be oriented towards popularity of documents (increasing weight of references), towards length of documents, favour the use of rare or common words; rely on structural information such as metadata or headings, etc. Existing accessibility measurement techniques are limited as they measure retrievability with respect to all possible queries. In this paper, we improve accessibility measurement by considering sets of relevant and irrelevant queries for each document. This simulates how recall oriented users create their queries when searching for relevant information. We evaluate retrievability scores using a corpus of patents from US Patent and Trademark Office.
منابع مشابه
Retrieval Models versus Retrievability
Retrievability is an important measure in information retrieval that can be used to analyze retrieval models and document collections. Rather than just focusing on a set of few documents that are given in the form of relevance judgments, retrievability examines what is retrieved, how frequently it is retrieved, and how much effort is needed to retrieve it. Such a measure is of particular intere...
متن کاملOn the relationship between query characteristics and IR functions retrieval bias
Bias quantification of retrieval functions with the help of document retrievability scores has recently evolved as an important evaluation measure for recall-oriented retrieval applications.While numerous studies have evaluated retrieval bias of retrieval functions, solid validation of its impact on realistic types of queries is still limited. This is due to the lack of well-accepted criteria f...
متن کاملEvaluating bias in retrieval systems for recall oriented documents retrieval
The evaluation of a retrieval system has always been the focus of research. Most of the retrieval systems seem to be more efficient for precision oriented documents than recall oriented documents since there is a difference between both the recall and precision oriented documents. Therefore, a system that is efficient for the retrieval of precision oriented documents does not need to be good fo...
متن کاملImproving Retrievability and Recall by Automatic Corpus Partitioning
With increasing volumes of data, much effort has been devoted to finding the most suitable answer to an information need. However, in many domains, the question whether any specific information item can be found at all via a reasonable set of queries is essential. This concept of Retrievability of information has evolved into an important evaluation measure of IR systems in recall-oriented appl...
متن کاملEfficiently Estimating Retrievability Bias
Retrievability is the measure of how easily a document can be retrieved using a particular retrieval system. The extent to which a retrieval system favours certain documents over others (as expressed by their retrievability scores) determines the level of bias the system imposes on a collection. Recently it has been shown that it is possible to tune a retrieval system by minimising the retrieva...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009