Evaluation of controlled vocabularies by inter-indexer consistency
نویسندگان
چکیده
Introduction. Several controlled vocabularies are used for indexing three journal articles to check if better or equal consistency rates are achieved with a list of descriptors than with a standard thesaurus and augmented thesaurus. Method. A terminology set for library and information Science was used to build a list of descriptors with equivalence relations (USE and UF), a standard thesaurus and an augmented thesaurus (all the descriptors have scope notes). Subsequently, three articles were indexed by selected indexers with varying degrees of experience: on the one hand Library and Information Science students and, on the other, professionals from various documentation centres. Hooper's measure to find the consistency between pairs of novice indexers and experts has been applied. Analysis. Data were tabulated and analysed systematically according to pairs of novice indexers and experts. Results. The tool with the best results is the list of descriptors (39.5% consistency), followed by the augmented thesaurus (29.8%) and, with an almost identical value, the standard thesaurus (27.5%). Conclusion. It is concluded that the list of descriptors in both groups returns better indexing consistency but more research is required. Introduction Vocabulary control has been revealed as an essential procedure in the organization and retrieval of information. The most significant contributions in this field of work are many and varied although the main ones taken here are those from Gil-Leiva, 2008: 118-154. The first contribution was the work done by Charles Ammi Cutter in his famous Rules for a printed dictionary catalog published in 1876. It is here that the first rules appear that are in effect today, such as the principle of economy, the definition and use of the term headings for both matter and for place and form; resending for synonyms and antonyms; the problem of homonymy; the structure of the subject headings (simple and complex); word inversion; syntax (See, See also, etc.) and punctuation marks (commas, brackets, etc.). The second contribution was the building of lists of subject headings. Shortly after the contributions of Cutter, the American Library Association (ALA) published in 1895 the List of Subject Headings for Use in Dictionary Catalogs as an indexing tool for small and medium sized libraries with non specialized stocks. The first Subject Headings Used in the Dictionary Catalogs of the Library of Congress appeared in 1909 and took as its main references the contributions mentioned above. Although it came into being for internal use for the cataloguers in the Library of Congress, it would soon become a reference tool used in indexing in large public and academic libraries and it was translated or either totally or partially adapted to other countries and languages, for example, Brazil (1948), Canada (1967), Greece (1978), South Africa (1992) or Egypt (1995) among others. The third contribution comes from Mooers, who at the beginning of the 1950s introduced the word descriptor to communicate ideas, so distancing himself from particular terminological uses employed in documents and thus specifying the subject of the information in an information retrieval context. A follow-on to this was the construction of the first lists of descriptors and the first thesauri, like the Dupont Thesaurus (Engineering Information Centre Du Dupont 1959), the Thesaurus of Astia Descriptors (United States Department of Defense, 1960), or the Chemical Engineering Thesaurus (American Institute of Chemical Engineers, 1961), among others. The fourth contribution is the provision of national and international norms. Work in this sphere got underway early in France, since in 1957 the AFNOR Z 44-070 Catalogue alphabétique de matières was presented, which was devoted to establishing and providing rules for the choice and presentation of subject headings. The first norms for thesauri were the French AFNOR Z 47-100-1973 (Norme experimental. Regles d’établissement des thèsaurus monolingues), the ISO 2788-1974 (Documentation. Guides for the establishment and development of monolingual thesauri) and the ANSI Z39.19-1974 (American National Standard guidelines for thesaurus structure, construction and use). Since then, other countries and the ISO itself have been working on and extending the norms until the unification of the ISO 2788-1986 and ISO 5964-1985 in the new ISO/DIS 25964-1:2010 (Information and documentation—Thesauri and interoperability with other vocabularies (Part 1: Thesauri for information retrieval; Part 2: Interoperability with other vocabularies). The evaluation of controlled vocabularies is an issue of concern for professionals and researchers in the area. The evaluation can be performed with the aim of the analysis being the controlled vocabularies themselves so as to study their structure, the thematic fields or facets, scope notes, semantic relations, degree of specificity, etc., (intrinsic evaluation) or by studying the impact on the information systems which use them both in indexing and retrieval (extrinsic evaluation). The first evaluation of import was carried out by Cleverdon in the Cranfield Projects (1956; 1960, etc.). Cleverdon compared the efficiency of the Universal Decimal Classification, an alphabetical index of subjects, a faceted classification scheme and the indexing through uniterms of eighteen thousand documents analysed by three indexers. There have been many and varied subsequent studies to evaluate controlled vocabularies, both subject headings and thesauri. We have for example the works by Henzler (1978), Fidel (1991 and 1992), Betts and Marrable (1991); Ribeiro (1996), Gil Urdiciaín (1998) and Gross and Taylor (2005), who studied the advantages and drawbacks of indexing and retrieving documents in natural language and in controlled language. Another way of evaluating controlled vocabularies, mainly thesauri, is to compare them with each other. Kishida, et al. (1988) compared the MeSH (Medical Subject Headings), the ERIC thesaurus, the INSPEC and the Root thesaurus, among others, taking as their reference the construction principles, their structure and the information they contributed. In contrast, Weinberg and Cunningham (1985) studied the semantic proximity between MeSH and Medline, while Pozhariskii (1982) proposed quantifying the capacity or semantic strength of a thesaurus in terms of flexibility, economy and universality. Elsewhere, Larsen (1988) analysed the capacities for use of a thesaurus for indexing a certain collection of documents. Soler Monreal (2009) evaluated three controlled vocabularies (a list of descriptors, a standard thesaurus and an augmented thesaurus in which all the descriptors have scope notes) in order to find out if consistency scores higher than a standard thesaurus and augmented thesaurus are obtained with a list of descriptors. Indexing consistency can be studied as a reference to a single indexer or to several. When a professional indexes the same document at different moments in time we speak of intra-consistency or intra-indexer consistency. And when several people indexing a document to compare the results or the result of indexing a document by two indexers is compared, we speak of inter-consistency or inter-indexer consistency. Since the 1960s, numerous and diverse investigations have been carried out on indexing consistency. The main conclusion which can be drawn from the tests is that inconsistency is an inherent feature of indexing, rather than a sporadic anomaly. Although the tests carried out are very diverse in their methodology, we can say that achieved indexing consistency ranges from approximately 10% to 60%. The vast majority of the tests carried out from 1960 until the present time cannot be homogenized because of the methodological diversity used. We only point out here some of the variables that hinder their homogenization and only a sample of the tests carried out: Measures: To find the consistency scores between groups of indexers Slamecka & Jacoby (1965) and Iivonen (1990) proposed their respective measures. In contrast, for pairs of indexers, other measures were outlined as in Hooper (1965), Lancaster (1968), Rolling (1981) or Saarti (2002), although the most commonly used in most tests is that of Hooper, c / (a + b c), where c is the common terms between the two indexers, a is the terms proposed by indexer one, and b is the terms proposed by indexer two. Novice versus expert indexers: On some occasions expert indexers were employed as in Lancaster (1968), Leonard (1975); on others it was with novice indexers Hudon (1998a and 1998b), Gil-Leiva (2002); or with experts and novices (Bertrand and Cellier 1995; Saarti, 2002; Soler Monreal 2009). Number of indexers: Lancaster (1968) worked with three indexers; Bertrand and Cellier (1995) twenty-five indexers; Hudon (1998a and 1998b) twenty-five indexers; Gil-Leiva (2002) twenty-seven indexers; Saarti (2002) thirty indexers; and Soler Monreal (2009) sixty-three indexers. Material used: Sometimes work has been with journal articles (Lancaster 1968; Leonard 1977; Funk and Reid 1983; Middleton 1984; Sievert and Andrews 1991; Iivonen and Kivimäki 1998; Leininger 2000; Gil-Leiva 2002), sometimes with books (Tonta 1991; Bertrand and Cellier 1995; Gil-Leiva 2001; Saarti 2002; Neshat and Horri 2006; Gil-Leiva et al. 2008; Chen 2008) and at other times with visual material (Markey 1984 and GilLeiva 2002); summaries of journal articles have also been used (Hudon 1998; Soler Monreal 2009); social tagging on delicious.com Kipp 2009) and social tagging on CiteULike.org (Wolfram et al. 2009). Number of documents indexed: Lancaster (1968) used sixteen articles; Tarr and Borko (1974) fifteen items; Leonard (1975) 100 articles; Funk and Reid (1983) 760 articles; Markey (1984) 100 documents; Iivonen (1990) ten documents; Siever and Andrews (1991) seventy-one articles; Iivonen and Kivimäki (1998) forty-nine documents; Leninger (2000) sixty documents; Tonta (1991) eighty-two books; Bertrand and Cellier (1995) eight books; Gil-Leiva (2001) eleven books; Saarti (2002) five books; Gil-Leiva et al. (2008) ten books; Chen (2008) 3,307 monographs; Hudon (1998a and 1998b) twelve abstracts; Monreal Soler (2009) three abstracts. Hawthorne effect: Individuals, when they know that they are being studied behave differently from when they do not know they are being observed. In some studies indexers knew their product would be evaluated (Lancaster 1968; Leonard 1975; Bertrand and Cellier 1995; Hudon (1998a and 1998b); Gil-Leiva 2002; Saarti, 2002; Soler Monreal 2009); and in other studies as the result of indexing the same documents is compared but in different information systems could not be this possibility as, for example, two bibliographic databases: Middleton (1984) compares the indexing of ERIS/APAIS and AEI/APAIS; Iivonen and Kivimäki (1998) databases KINF and LISA; or compare the indexing library catalogues like Tonta (1991) who compared the Library of Congress and the British Library; Gil-Leiva (2001) in thirty-one catalogues of public libraries; Neshat and Horri (2006) National Library of Iran and twelve academic libraries; Gil-Leiva et al. (2008) thirty university library catalogues; Chen (2008), the National Library of China and the China Academy Library & Information System. Finally, duplicate records in information systems, Funk and Reid (1983) use 760 articles indexed in Medline twice; Siever and Andrews (1991) worked with seventy-one duplicates of the database Information Science Abstracts; Leininger (2000) compared sixty duplicates of the PsycINFO database. Concepts versus terms: In most of the studies mentioned above these are compared with all the indexing terms or descriptors derived from a controlled vocabulary, while sometimes comparisons were made with the concepts taken directly from the documents to find consistency, such as Iivonen (1990) or Gil-Leiva (2002), that take both the concepts used as descriptors from Eurovoc Thesaurus. Materials and methods For this study we built three controlled vocabularies on information science: a list of descriptors with control for synonymy; a standard thesaurus and a thesaurus in which all the descriptors have scope notes (augmented thesaurus). At the time of initiating this research there did not exist in Spanish a thesaurus published on this subject. Hence, we began to refine a list of descriptors consisting of 2,756 terms which were in use in the design and maintenance of an automatic indexing system (Gil-Leiva 1997 and 2008). Finally, the list was a total of 2,455 terms, of which 1,436 are descriptors and 1,019 non-descriptors. A standard thesaurus was constructed from this list. This thesaurus has an alphabetic display, another hierarchical one and other types KWOC permuted index. Appendix A shows the first terms of the three tools built. The thesauri were built with the thesaurus management software MultiTes and following Spanish norm UNE 50-10690 (equivalent to ISO 2788-1986). Table 1: Descriptors of the standard thesaurus Centralized acquisition Topographic catalogues TC: J02 UP: Centralized Purchases TG1: Acquisition of documents TG2: Development of collections TG3: Documental process TC: F03 TG1: Catalogues (information sources) TG2: Secondary sources TG3: Information sources Finally, specialized dictionaries are used to add the scope notes to all the descriptors to build augmented thesaurus. Centralized acquisition Topographic catalogues TC: J02 NA: Purchase of documental stocks by an institution which also distributes them to other centres so as to economize on resources. UP: Centralized purchases TG1: Acquisition of documents TG2: Development of collections TG3: Documental process TC: F03 NA: Catalogues in which the bases follow the order of the place occupied by the documents in the collection or on the shelves, coinciding with the order of the topographic library number. TG1: Catalogues (information sources) TG2: Secondary sources TG3: Information sources Table 2: Descriptors from augmented thesaurus After building the three controlled vocabularies, an intrinsic (qualitative and quantitative) evaluation was carried out to check that they comply with the recommendations for the compilation of thesauri. The compilation was carried out following the parameters proposed by Lancaster (2002), Gil Urdiciaín (2004) and Gil-Leiva (2008). It was confirmed that the thesauri meet the traditional requisites for compilation of thesauri. Later, we decided that the material to be indexed was to be three abstracts of journal articles since these are concise, well structured and understandable information sources (Appendix B). We then worked on the selection of the indexers who were going to use the three indexing languages to index three abstracts of information science articles. Finally, we decided that the indexers should have different levels of experience. Group 1: Second year information science students Group 2: Fourth year information science students Group 3: Fifth year information science students Group 4: Experienced professionals in document indexing The three groups of students already had some theoretical and practical knowledge of indexing and use of controlled vocabularies. Each group comprised eighteen people and was divided into three subgroups of six indexers for each of the three tools. The exception was Group 4, which was made up of nine professionals for whom indexing is a habitual task. The professionals work in documentation centres in public administration (3), communication (3) and technological institutes (3). These were also subdivided into nuclei of three indexers per tool. None of the indexers were familiar with the indexing languages constructed for the tests, although both the novice and the expert indexers had used indexing languages from other fields. Finally, it should be mentioned that it was difficult to find more professionals who were available to participate in these types of tests. The results of the indexing of the three abstracts were compared pair wise, so novice indexers were compared fifteen times for each of the three articles and for each of the three tools being compared – giving a total of 137 comparisons. As regards the expert indexers, three comparisons were obtained for each for each of the three articles and the three tools under comparison – giving a total of twenty-seven comparisons. We used a relaxed, and non exact, system of coincidence to calculate consistency between indexers, as was done in Gil-Leiva (2001) and GilLeiva et al. (2008). A coincidence of 1 (100%), 0.5 (50%) or 0 (0%) was considered. For example, if one indexer consigns librarians and another reference librarians, a consistency of 0.5 is recorded. As a general norm, it was considered that a score of 0.5 should be awarded to those non coincident terms that were, however, specific of another one, while 1 was given to very similar concepts. Table 3: Table of relaxed equivalences between descriptors Indexer 1 Indexer 2 Agreement Biomedical journals Scientific journals 0.5 Librarians' techniques Librarianship 1 Databases Bibliographical databases 0.5 Librarians Librarians of reference 0.5 Scientific journals Scientific publications 1 Since their beginnings, tests on indexing consistency have used various formulas, among which the most important are those used by Hooper (1965) and Rolling (1981). Gil-Leiva (1997 and 2001), Gil-Leiva et al. (2008) and Soler Monreal (2009). We have used extensively Hooper’s measure of indexing consistency adapted as follows: Ci = Tco (A + B) Tco where Ci is the consistency between two indexings, Tco is the number of terms in common between the two indexings, A is the number of terms used by Indexer A, B is the number of terms used by Indexer B, and Tco is the number of terms they use in common.
منابع مشابه
بررسی میزان تطابق زبان نمایهسازان، نویسندگان و برچسبگذاران در پایگاه اطلاعاتی اریک و مندلی
Objective: The purpose of this study was to identify the language consistency between indexers, authors and taggers in the ERIC and Mendeley databases. Methodology: This survey was conducted using content analysis methods and techniques to evaluate the language consistency between indexers, authors and taggers in the ERIC and Mendeley databases and also to determine common keywords. The sample ...
متن کاملResearch Paper: Methods for Semi-automated Indexing for High Precision Information Retrieval
OBJECTIVE To evaluate a new system, ISAID (Internet-based Semi-automated Indexing of Documents), and to generate textbook indexes that are more detailed and more useful to readers. DESIGN Pilot evaluation: simple, nonrandomized trial comparing ISAID with manual indexing methods. Methods evaluation: randomized, cross-over trial comparing three versions of ISAID and usability survey. PARTICIP...
متن کاملThe Epistemic Dynamic Model: Developing a Theory of Tagging Systems
Tagging systems are intriguing dynamic systems, in which users collaboratively index resources with the so-called tags. In order to leverage the full potential of tagging systems, it is important to understand the relationship between the micro-level behavior of the individual users and the macro-level properties of the whole tagging system. In this thesis, we present the Epistemic Dynamic Mode...
متن کاملIndexing Consistency and its Implications for Information Architecture: A Pilot Study
Consistency in the assignment of indexing terms has been studied on the small scale for many years. As opportunities increase for large numbers of people to contribute to indexing of public documents on the World Wide Web, consistency on the large scale becomes problematic. This pilot study examines inter-indexer consistency on a larger scale than other studies. Consistency in the assignment of...
متن کاملمدل دو مرحله ای شکاف- گلچین برای نمایه سازی خودکار متون فارسی
Purpose: Each language has its own problems. This leads to consider appropriate models for automatic indexing of every language. These models should concern the exhaustificity and specificity of indexing. This paper aims at introduction and evaluation of a model which is suited for Persian automatic indexing. This model suggests to break the text into the particles of candidate terms and to c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Res.
دوره 16 شماره
صفحات -
تاریخ انتشار 2011