Evaluation of the Current DOE Document Conversion System : A Study of Retrievability
ثبت نشده
چکیده
(UNLV) has been tasked to suggest improvements and evaluate the performance of the current DOE document conversion system. 1 This report gives a summary of the recommendations made by ISRI staff and a summary of the results of two types of performance tests. There are two approaches to evaluating the performance of document conversion systems. One approach is to measure the accuracy of the textual output (i.e., average character accuracy) of the system. A second approach is to measure the performance of the system that will make use of the output text. In this case, textual output will be used to build the index for an Information Retrieval (IR) system that will aid in the task of finding documents of interest. The appropriate performance measure for IR systems is retrievability (i.e., precision and recall). Thus, to provide a thorough evaluation of system performance, two different studies (a character accuracy study and a retrievability study) have been conducted. [1, 2] Section 3.1 below, gives a summary of the results of accuracy tests and Section 3.2 summarizes the results of the retrievability tests. The task of document preparation for the LSN has two major components: character recognition and page zoning. The task of loading the text produced into an information retrieval system is, by comparison, straightforward and not error prone. Thus, in any document conversion system, character recognition and page zoning are performance-controlling operations. Although character recognition is typically measured by standard character accuracy, many characters in a document's text have no role in its retrievability. For example, punctuation marks, end-of-line hyphenation, and characters in stopwords 2 are ignored by an IR system. The top ten standard stopwords account for about 20 to 30 percent of all words in any collection. Thus, while character accuracy is related to retrievability, it is not a good measure of retrievability. For these reasons, and because retrievability is more important to users of the LSN, the retrievability testing described in Section 3.2 was recommended by ISRI staff.
منابع مشابه
Evaluation of EAP Programs in Iran: Document Analysis and Expert Perspectives
This study aimed to examine the policies in the Iranian English for Academic Purposes (EAP) education and the extent to which objectives match the policies and are materialized in practice. To this end, course descriptions in the syllabi for the EAP programs were evaluated through document analysis and triangulated with the experts’ perspectives through interviews to examine the current status ...
متن کاملImproving Retrievability and Recall by Automatic Corpus Partitioning
With increasing volumes of data, much effort has been devoted to finding the most suitable answer to an information need. However, in many domains, the question whether any specific information item can be found at all via a reasonable set of queries is essential. This concept of Retrievability of information has evolved into an important evaluation measure of IR systems in recall-oriented appl...
متن کاملEfficiently Estimating Retrievability Bias
Retrievability is the measure of how easily a document can be retrieved using a particular retrieval system. The extent to which a retrieval system favours certain documents over others (as expressed by their retrievability scores) determines the level of bias the system imposes on a collection. Recently it has been shown that it is possible to tune a retrieval system by minimising the retrieva...
متن کاملA Fuzzy Controlled PWM Current Source Inverter for Wind Energy Conversion System
In recent years, there has been a fast growth in wind energy conversion system (WECS). There are two general types of wind turbines in WECS: fixed speed wind turbines and varying speed wind turbines.Permanent magnet synchronous generator (PMSG) is one of the most attractive generators for the varying speed turbine WECS.In this paper, a fuzzy controller is proposed to control the current source ...
متن کاملتدوین و سنجش نشانگرهای ارزیابی تحول اداری: مطالعه موردی
Introduction: The administrative reform is one of the means to achieve economic social and cultural policy development. Given the necessity of administrative reform monitoring, this study aimed to identify and measure the indicators of administrative reform in the medical school of Tehran University of Medical Sciences. Methods: A mixed sequential qualitative-quantitative approach was employed....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002