Sections, categories and keywords as interest specification tools for personalised news services
نویسندگان
چکیده
Through a evaluation of system performance and user satisfaction for the Mercurio system a system that sends personalised news selections via email -, the general applicability and usefulness of different methods of specifying user interest (sections, categories and keywords) are considered for the general case of digital news services. The specific characteristics distinguishing such systems from more general information systems are outlined and their effect is discussed. An evaluation blueprint for them is proposed starting from information retrieval procedures, existing work on search engine evaluation, and a close study of the working principles and the required evaluation according to the particular properties and conditions of the services under consideration. Actual evaluation results for system tests based both on real users and custom tailored test cases are presented and discussed. Conclusions cover the nature of the information handling tasks that digital news services are faced with, the relative merits of sections, categories, and key words with respect to this particular set of tasks, and the risks of careless application of recall and precision measures in systems such as these. Introduction The recent boom in the popularity of the Internet has resulted in a rapid expansion of the range of paradigms of information services available to the common user. One such paradigm is that of systems offering to send users a selection of the daily news by electronic mail. These systems are currently classed as information filtering applications, yet differ from more general information filtering or information retrieval systems in that their contents are completely renewed at regular intervals daily in the case of most digital newspapers and remain stable during intervening periods. This simple difference has considerable implications on the general characteristics of the databases that house these contents: they are perishable, throughout their lifetime they remain static, they are usually small, and they hold no claim to universal coverage of any information domain. Although existing systems of this sort are applying wellknown techniques from the fields of information retrieval and information filtering, these particular restrictions governing their operation may affect the general applicability and usefulness of different methods of information access. Furthermore, it is at present unclear to what extent generally accepted evaluation measures for information retrieval, such as recall and precision, can be meaningfully applied in these circumstances. In order to explore these issues we have carried out thorough evaluations of a system that incorporates these basic methods (newspaper sections, categories and keywords) into the process of selecting the particular information items that are relevant to a given I The research work presented in this paper was partially funded by grant REF TS203/1999 of the ATYCA program, Ministerio de Industria y Energía, Spain user. The Mercurio system (Díaz et al., 2000) allows readers of the newspaper to receive a periodic e-mail message containing the news items that the system finds particularly relevant to the interests of the user, previously defined during registration. The system constitutes an integrated and customisable option of combining document relevance information from sources of three different kinds: – A prior categorisation by the system owner (documents, i.e. news are sorted into sections by the editor) – Keyword search information over the content of the documents – Automatic categorisation results against an alternative (less domain specific) set of categories An additional control layer is provided that allows users to specify how important each of the different methods of selection is to his particular interests. As such, it presents a flexible, multidimensional and browsable user model, and a set of well founded techniques for user models and information items matching. These features make it an ideal test bed for the questions described above. Evaluation is carried out in two distinct stages. During the initial stage, evaluation of the system is carried out according to standard practice in the field of information retrieval. The results fall specially short with respect to determining relative performance of the different methods of information access involved. The second stage of the evaluation includes additional sets of experiments specially designed to shed light on this issue. Evaluation of Systems for Information Access At present, Internet is characterised by a proliferation of information systems. These systems present a multitude of innovations, both regarding the nature of the information they deal with and the actual form of service they provide. From many of these, new ways of understanding information services and information systems are arising. In this paper we are concerned with the type of services that send periodic news selections to subscribers of a digital newspaper by means of electronic mail. These systems present certain variations, however there is a set of characteristics that are common to them all: a) Personalisation is a key factor, specially if carried out with respect to a user profile or user model employed to specify the information that is desired and to avoid unnecessary processing of unwanted documents or services. These profiles or models are usually based on the possibility of selecting newspaper sections. Although user comments show that additional features allowing use of keywords or categories are an important success factor, most systems still do not have any means of categorisation other than newspaper sections. On the other hand, personalisation also plays a role with respect to the format of the message sent (for instance, having the message headed with the reader's name, as a personalised newspaper), the choice of the days of the week in which the message should be sent, the presentation of the results, or the possibility of specifying either a number of news items or threshold relevance values to shape the personalised message. b) Most systems pay special attention to ease of access and manipulation by the users. This includes speedy transmission, ease of subscription and un-subscription. The working context always remains close to the digital newspaper from which the system springs, allowing the user to consult, for instance, the database of back issues. The presentation schema headline, abstract, and text, together with a relevance value with respect to the user profile rates the highest in terms of user satisfaction, and yet it is not the most frequent. c) Information agents tend to present a friendly interface (fonts, colours, etc), specially where it concerns the design of each message. In most cases, they also include some means of contacting the service provider, a facility that improves efficiency and competitiveness The lack of uniformity in these services shows up more easily in those that specialise on graphical information (photographs). These systems place more emphasis on design and commercial aspects, and give less importance to sending documents by electronic mail. Evaluation of these new instruments of information retrieval requires: a validation of traditional evaluation measures within the new field of Internet, consideration of the knowledge acquired during evaluation of search engines, and a close study of the working principles and the required evaluation according to the particular properties and conditions of the services under consideration. With respect to search engine evaluation, the criteria that have been employed for qualitative evaluation to the present time are (Maldonado y Fernández, 1998): Coverage: evaluators are concerned with the number of web pages that the search engine has access to. Other relevant parameters are: geographical and content scope of the database, harvesting models, specific processing of the web documents and the possibility of accessing different types of information and resources. Search forms: providing the possibilities of searching at different levels. At present, their versatility and different possibilities are being considered. Search fields: the existence of different fields that can be used to guide user searches must be considered. Possible examples are: title, description, URL, keywords, language, information type, owner type, etc. Search tools: there are several possible instruments for search and retrieval of information, like categories, key words, stemming, boolean operators, locating compound terms, searching by phrases, proximity, searching by fields, restricting to certain dates, to certain domains, to certain languages, to certain file types, the ability to recognise meta-information, etc. Thematic classification and vocabulary control: existence of some way of structuring information, such as categories or other forms of control over vocabulary, as well as their applicability with respect to different types of information. Detection of novelties: systems should be able to identify and locate registers newly incorporated to the database, by means of special labels, delimitation and organisation of documents by date of inclusion, etc. Shaping results: users should be allowed to chose or define the format of presentation and the criteria used to determine the order of presentation of the results. Reviewing quantitative research carried out over Internet search engines, it becomes apparent that there is no unified method of procedure. Further work is required on: the influence on results of the particular methods used for harvesting and compiling data, the varying nature of search engines and the dynamic character of the database they are operating on -; the problems presented by classical instruments of measurement; the need for new evaluation measures, etc. Following (Olvera Lobo, 2000), and having reviewed existing literature on this topic (Dong and Su, 1997; Gordon and Pathak, 1999; Schwartz, 1998; Leighton and Srivastava, 1999; Clarke and Willet, 1997) the different mechanisms, methods and trials carried out so far all agree on the significance of the following phases in an evaluation: a) Determination and subsequent formulation of the information needs of the users. After an initial stage in which questions about evaluation were provided by the researchers themselves with an obvious risk of partiality a new trend imposes the collection of questions posed by 'real users of information'. This initiative faces the limitations imposed by laboratory research and an additional problem of loss of information in the process of translating the informative interests of the users. In as much as it is a starting point, the questions must obey the following ideas: previous knowledge of the existence of specific resources on the Web; combining different levels of difficulty and aims; combining different degrees of technical coverage and question specificity; and delimitation of the number of questions selected according to the aim of the endeavour. b) Studying the syntax of the query. This factor (use of logical operators, parenthesis, etc.) acquires a certain importance. Even so, the achievement of correct results is dependent on an adequate selection of the terms employed in the query. One must determine how many and which particular terms are used, and specify whether boolean logic or natural language is employed to structure the query. Other important factors are: use of upper case letters, stemming, etc. One should take into account throughout that the best results will probably be obtained by posing queries with a relatively simple syntax. c) Monitoring the timing of the searches. Given the dynamic character of the Web and search engines in general, it is convenient to carry out all searches simultaneously, running the same query on each search engine. This is because any delay may result in changes in the set of documents available, and thereby, in changes in the results. d) Specifying the set of relevance judgements to be assigned. To study the effectiveness of the system, the relevance of each retrieved document must be stated. This represents a problematic aspect that may be resolved by resorting to external sources, such as TREC, etc. As a first stage, four levels can be distinguished: a) duplicated, inactive or irrelevant links; b) links that are relevant from a technical point of view; c) potentially useful links; d) links that are probably most useful. e) Selecting the means for the analysis of the results. Recall and precision measurements remain useful depending on the number of items considered (different studies impose different thresholds). Nonetheless, recall has presented problems due to the difficulty of ascertaining the total number of relevant documents with respect to a specific query. This problem has been tackled either by applying the relevance values to a restricted controlled subset of a collection of documents, or by carrying out several different searches to obtain a set of relevant documents that is taken as the total set. An extrapolation of the evaluation methodology applied to search engines is proposed as a starting point in the effort to sketch an evaluation method for services providing news selections via electronic mail. This extrapolation results in the following considerations: a) Qualitative criteria: Coverage: this issue is taken into account in most existing evaluation frameworks. In the case under consideration, since a specific newspaper is taken as the subject of research, the discussion will be centred on the algorithms used for retrieval, categorisation, and learning and their effect on the accessibility of available documents. Search forms and fields: the analysis is oriented to the observation of how users understand the working and the aim of the forms. Search tools: the evaluation concentrates in the creation of user profiles and their usefulness. Thematic categories, key words, and sections are employed. For this reason, parameters deemed relevant in this respect are: the adequacy of the number of categories presented, whether the profile reflects the way in which the user specifies the information that he desires, whether categories are well organised, whether there is any overlap between them, whether there are enough of them, whether available means of representing relevance are adequate, etc. Detection of novelties: given the working context set of news available in a given digital newspaper on a specific day – this parameter plays no role in the general evaluation. Shaping results: we are referring to parameters related with giving the user the possibility of specifying the format of presentation and the ordering criteria. For instance, having results ordered according to their relevance values with respect to the user profile in a given system. b) Quantitative criteria: Determining user needs: an adequate number of users is selected, taking special care to ensure that they cover a wide range of possible dispositions – in terms of computer literacy and familiarity with Internet applications. At the same time, the evaluation exploits the information available in terms of different user profiles registered during testing, which yields important insights. Query syntax: does not apply in this case, since no actual query is provided by the users. Nonetheless, it is relevant to study different user behaviours with respect to profile creation and their relationship with final results. Search timing: unlike general purpose search engines, the services under discussion behave statically from one day’s edition to the edition of the following day. Therefore no special attention need be paid to the timing of the application of the user model to the newspaper contents on each particular day. It is interesting, however, to carry out tests on different days Relevance judgements: in some cases, relevance judgements for a given document depend on feedback provided by the users themselves, and in other cases they are based on diagnose by the researchers. Analysis of results: precision and recall measurements are retained. In this case there are no problems with the recall measurement because it is possible to determine the total number of relevant news items available on the newspaper site on a given day. Mercurio: A Personalization System for Digital News Services Mercurio (Díaz et al., 2000) is a personalised news service system that applies existing techniques from the field of text classification, text categorization (Sebastiani, 1999) and information retrieval (Baeza-Yates and Ribeiro-Neto, 1999), besides user modelling (Gervás et al., 1999; Amato and Straccia, 1999), to the selection of items relevant for the user. Each user can create a profile with his preferences and receive daily the news items that interest him. A user accesses the information server and registers for the service. During registration essential data are noted (email address, login, password). A profile for the user (or user model) is built, containing information such as the days of the week in which he wants to receive news, the number of items he wants to receive each time, and the user's interests. These interests are represented with respect to three systems of reference: the sections of the newspaper, a set of categories presented as an alternative system of classification (the first level of categories from Yahoo Spain), and terms chosen by the user as interesting. There are 8 sections in the chosen newspaperII: opinion, national, international, economy, society, culture, sports and people. The first level of categories from Yahoo Spain consists of the following 14 categories: Arts & Humanities, Science, Social Science, Recreation & Sports, Business & Economy, Education, Entertainment, Computers & Internet, Reference, News & Media, Government, Health, Society & Culture, and Regional. The system imposes no restrictions on the set of keywords that the user may choose. Too many methods of selection available simultaneously can lead to confusion. Unless additional control features are provided, users get at most a blurred picture of the operation of the system. For this reason, our personalisation architecture allows an extra level of user specification. A general control mechanism has been included to make the results more predictable for the user. Each of the three features (keywords, sections and additional categories) has a weight that represents its importance for the user interests. For example, if the weight of sections is low and the weight of additional categories is high, relevance values concerning additional categories will be considered more important for selecting news items. In this way, each of the three dimensions considered in the user profiles can be defined and controlled by the user, providing a fine-tuning mechanism to obtain a flexible characterisation of his interests. The message received by the user contains: the name of the user, the date, and a list of news ranked according to user information interests and according to the upper bound defined. Each news item is presented with a title, a short summary, the relevance, and a link to the news item in the digital newspaper. At the end of the message appear the interests of the user as features in his profile in order to allow the user to check the true relevance of the received news. The representation of the news items is obtained applying the Vector Space Model to their texts (Salton, 1989). A representation for each category can be obtained by applying text categorisation techniques (Gómez and Buenaga, 1997; Lewis et al., 1996) and using a set of training documents (set of documents labelled manually with the suitable categories). In our case, the set of training documents used were the web pages indexed by the Spanish version of Yahoo! within these categories. Thus, each category can be represented by a term weight vector that is obtained from the name of the II ABC: http://www.abc.es/ category, the name of its subcategories, and the names and short descriptions of the web pages associated to the category. The keywords also are represented with VSM, using the weight assigned for each word in the model. To perform the selection we applied category-pivoted categorization (Sebastiani, 1999; Yang, 1999) with the categories and Information Retrieval (Baeza-Yates and RibeiroNieto, 1999) with all the keywords. Also all the news are processed to check if they belong to one of the sections selected in the user model. When all the documents have been sorted according to the different sources of relevance, the resulting orderings are integrated by using the level of interest that the user assigned to each of the different reference systems. In order to make the relevance values provided to the user easy to interpret, they are normalised over the number of selection methods involved in obtaining them. This allows the system to quote a final relevance value in the range 0100% to every user regardless of the number of selection methods that each particular user chose. Initial Evaluation We describe and discuss the three kinds of evaluation that were carried out: an evaluation carried out by a set of different users, a system evaluation that considers the performance of the system in measurable parameters, and an evaluation of the user model provided and how the evaluators have fared in dealing with it. A controlled evaluation environment was established to allow analysis of the results with respect to the different kinds of user involved. Evaluation was carried out by 44 users in four categories: A) Collaborators; B) Researchers; C) University lecturers (both on Computer Science and Journalism); D) External users with no professional relationship with the fields involved. The system was evaluated by the users following a working pattern previously developed for the analysis of existing systems (García et al., 2000; Díaz et al., 2000; Pastor and Asensi, 1999). For the relevance of the received documents the users had to check the performance of the system against the actual set of documents available on the newspaper website on three particular days. Additionally, on those particular days, logs of system operation (available documents, user profiles at the time, and system selections for each user) were kept to allow objective results to be obtained. With this data we worked out two sets of recall and precision figures: one based on user criteria as put down in the forms, and one based on subsequent close analysis of system logs. User-centred Evaluation During the first stage, the evaluation was centred on user response and the vision that users develop of the system. The aim was to harvest explicit evaluations provided by the users about system response-time, ease of use, system efficiency, and conceptual and physical presentation. This information was compiled on the basis of a closed questionnaire with specific questions on the relevant main topics. For each of these parameters a numerical value was worked out from the users responses. In general, users found the system suitable although the results showed some differences between different groups of users. These were the results for the interface evaluation: System Access: (high); General Interface, User Adaptation, and Integration into User Environment: (medium-high); Management of
منابع مشابه
Evaluating a User-Model Based Personalisation Architecture for Digital News Services
An architecture that provides personalised filtering and dissemination of news items is presented. It is based on user profiles and it provides mechanisms that allow the user to control and tailor to his own needs the interaction between three different sources of relevance judgements: the existing newspaper categorisation by sections, basic information retrieval on user selected keywords, and ...
متن کاملArchitecture for Agent-Mediated Personalised News Services
Emerging agent technology can be used to implement a personalised news service. The consumer agent has a model of user's preferences and is able to request information that fulfills these needs. The producer agent has a logical model of the contents of the multimedia objects that are available for service. On the basis of this model, the producer agent can advertise its services in the network ...
متن کاملmyPlanet: an ontology driven Web based personalised news service
In this paper we present myPlanet, an ontologydriven personalised Web-based service. We extended the existing infrastructure of the PlanetOnto news publishing system. Our concerns were mainly to provide lightweight means for ontology maintenance and ease the access to repositories of news items, a rich resource for information sharing. We reason about the information being shared by providing a...
متن کاملKnowledge Media Institute myPlanet: An Ontology-Driven Web- Based Personalized News Service
In this paper we present myPlanet, an ontologydriven personalised Web-based service. We extended the existing infrastructure of the PlanetOnto news publishing system. Our concerns were mainly to provide lightweight means for ontology maintenance and ease the access to repositories of news items, a rich resource for information sharing. We reason about the information being shared by providing a...
متن کاملmyPlanet: an ontology-driven Web-based personalized news service
In this paper we present myPlanet, an ontologydriven personalised Web-based service. We extended the existing infrastructure of the PlanetOnto news publishing system. Our concerns were mainly to provide lightweight means for ontology maintenance and ease the access to repositories of news items, a rich resource for information sharing. We reason about the information being shared by providing a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Online Information Review
دوره 25 شماره
صفحات -
تاریخ انتشار 2001