Details on Biasing Web Search Results for Topic Familiarity

نویسندگان

  • Giridhar Kumaran
  • Rosie Jones
  • Omid Madani
چکیده

A typical web search engine returns a mix of introductory and advanced documents (around 50%) in response to a random selection of queries. Depending on a web searcher’s familiarity with a query’s target topic, it may be more appropriate to show her introductory or advanced documents. We conceptualize the notion of introductory and advanced documents in a way that obviates additional user-interaction and changes to existing search engine architectures. We show that topic familiarity required to understand a document (familiarity level) is a notion that people can agree on, as borne out by high inter-rater agreement (70%). We also show that this familiarity level is not predicted by reading level, so new methods of identifying it are needed. We develop a method for biasing the initial mix of documents returned by a search engine to increase the number of documents of desired familiarity level up to position 5, and up to position 10. Our topic-independent and user-independent method involves building a supervised text classifier, incorporating features based on reading level, the distribution of stop-words in the text, and non-text features such as average line-length. Using this familiarity classifier, we achieve statistically significant improvements at reranking the result set to show introductory documents higher up the ranked list. Our experiments indicate that we can perform this search result biasing for arbitrary users on arbitrary queries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Autonomous Page Ranking Method for Metasearch Engines

Ordering search results collected from multiple sources is a challenge to metasearch engines. We present an “autonomous” ranking method, meaning one that does not depend on the individual rankings returned by the engines participating in the search. Instead, it applies our TOPIC method for evaluating the reputation of a web page on a topic [7, 6] to the problem of ranking the results to a query...

متن کامل

Hierarchical Fuzzy Clustering Semantics (HFCS) in Web Document for Discovering Latent Semantics

This paper discusses about the future of the World Wide Web development, called Semantic Web. Undoubtedly, Web service is one of the most important services on the Internet, which has had the greatest impact on the generalization of the Internet in human societies. Internet penetration has been an effective factor in growth of the volume of information on the Web. The massive growth of informat...

متن کامل

Word Clouds of Multiple Search Results

Search engine result pages (SERPs) are known as the most expensive real estate on the planet. Most queries yield millions of organic search results, yet searchers seldom look beyond the first handful of results. To make things worse, different searchers with different query intents may issue the exact same query. An alternative to showing individual web pages summarized by snippets is to repres...

متن کامل

The Effects on Topic Familiarity on Online Search Behaviour and Use of Relevance Criteria

This paper presents an experimental study on the effect of topic familiarity on the assessment behaviour of online searchers. In particular we investigate the effect of topic familiarity on the resources and relevance criteria used by searchers. Our results indicate that searching on an unfamiliar topic leads to use of more generic and fewer specialised resources and that searchers employ diffe...

متن کامل

Effects of Individual Health Topic Familiarity on Activity Patterns During Health Information Searches

BACKGROUND Non-medical professionals (consumers) are increasingly using the Internet to support their health information needs. However, the cognitive effort required to perform health information searches is affected by the consumer's familiarity with health topics. Consumers may have different levels of familiarity with individual health topics. This variation in familiarity may cause misunde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005