Predicting Fluency With PageRank

نویسندگان

  • Thomas L. Griffiths
  • Mark Steyvers
  • Alana Firl
چکیده

Human memory and Internet search engines face a shared computational problem, needing to retrieve stored pieces of information in response to a query. We explored whether they employ similar solutions, testing whether we could predict human performance on a fluency task using PageRank, a component of the Google search engine. In this task, people were shown a letter of the alphabet and asked to name the first word beginning with that letter that came to mind. We show that PageRank, computed on a semantic network constructed from wordassociation data, outperformed word frequency and the number of words for which a word is named as an associate as a predictor of the words that people produced in this task. We identify two simple process models that could support this apparent correspondence between human memory and Internet search, and relate our results to previous rational models of memory. Rational models of cognition explain human behavior as approximating optimal solutions to the computational problems posed by the environment (Anderson, 1990; Chater & Oaksford, 1999; Marr, 1982; Oaksford & Chater, 1998). Rational models have been developed for several aspects of cognition, including memory (Anderson, 1990; Griffiths, Steyvers, & Tenenbaum, 2007; Shiffrin & Steyvers, 1997), reasoning (Oaksford & Chater, 1994), generalization (Shepard, 1987; Tenenbaum & Griffiths, 2001), categorization (Anderson, 1990; Ashby & AlfonsoReese, 1995), and causal induction (Anderson, 1990; Griffiths & Tenenbaum, 2005). By emphasizing the computational problems underlying cognition, rational models sometimes reveal connections between human behavior and that of other systems that solve similar problems. For example, Anderson’s (1990; Anderson & Milson, 1989) rational analysis of memory identified parallels between the problem solved by human memory and that addressed by automated information-retrieval systems, arguing for similar solutions to the two problems. Since Anderson’s analysis, information-retrieval systems have evolved to produce what might be an even more compelling metaphor for human memory—the Internet search engine—and computer scientists have developed new algorithms for solving the problem of pulling relevant facts from large databases. In this article, we explore the correspondence between these new algorithms and the structure of human memory. Specifically, we show that PageRank (Page, Brin, Motwani, & Winograd, 1998), one of the key components of the Google search engine, predicts human responses in a fluency task. Viewed abstractly, the World Wide Web forms a directed graph, in which the nodes are Web pages and the links between those nodes are hyperlinks, as shown in Figure 1a. The goal of an Internet search engine is to retrieve an ordered list of pages that are relevant to a particular query. Typically, this is done by identifying all pages that contain the words that appear in the query, then ordering those pages using a measure of their importance based on their link structure. Many psychological theories view human memory as solving a similar problem: retrieving the items in a stored set that are likely to be relevant to a query. The targets of retrieval are facts, concepts, or words, rather than Web pages, but these pieces of information are often assumed to be connected to one another in a way similar to the way in which Web pages are connected. In an associative semantic network, such as that shown in Figure 1b, a set of words or concepts is represented using nodes connected by links that indicate pair-wise associations (e.g., Collins & Loftus, 1975). Analyses of semantic networks estimated from human behavior reveal that these networks have properties similar to those of the World Wide Web, such as a ‘‘scale-free’’ distribution for the number of nodes to which a node is connected (Steyvers & Tenenbaum, 2005). If one takes such a network to be the representation of the knowledge on which retrieval processes operate, human memory and Internet search engines address the same computational problem: identifying those items that are Address correspondence to Tom Griffiths, University of California, Berkeley, Department of Psychology, 3210 Tolman Hall # 1650, Berkeley, CA 94720-1650, e-mail: [email protected]. PSYCHOLOGICAL SCIENCE Volume 18—Number 12 1069 Copyright r 2007 Association for Psychological Science relevant to a query from a large network of interconnected pieces of information. Consequently, it seems possible that they solve this problem similarly. Although the details of the algorithms used by commercial search engines are proprietary, the basic principles behind the PageRank algorithm, part of the Google search engine, are public knowledge (Page et al., 1998). The algorithm makes use of two key ideas: first, that links between Web pages provide information about their importance, and second, that the relationship between importance and linking is recursive. Given an ordered set of n pages, we can summarize the links between them with an n n matrix L, where Lij is 1 if there is a link from Web page j to Web page i and is 0 otherwise. If we assume that links are chosen in such a way that more important pages receive more links, then the number of links that a Web page receives (in graph-theoretic terms, its in-degree) could be used as a simple index of its importance. Using the n-dimensional vector p to summarize the importance of our n Web pages, this is the assumption that p 5 L1, where 1 is a column vector with n elements each equal to 1. PageRank goes beyond this simple measure of the importance of a Web page by observing that a link from an important Web page is a better indicator of importance than a link from an unimportant Web page. Under such a view, an important Web page is one that receives many links from other important Web pages. We might thus imagine importance as flowing along the links of the graph shown in Figure 1a. If each Web page distributes its importance uniformly over its outgoing links, then we can express the proportion of the importance of each Web page traveling along each link in a matrix M, where Mij 1⁄4 Lij= Pn

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Google and the mind: predicting fluency with PageRank.

Human memory and Internet search engines face a shared computational problem, needing to retrieve stored pieces of information in response to a query. We explored whether they employ similar solutions, testing whether we could predict human performance on a fluency task using PageRank, a component of the Google search engine. In this task, people were shown a letter of the alphabet and asked to...

متن کامل

The Evaluation of the Team Performance of MLB Applying PageRank Algorithm

Background. There is a weakness that the win-loss ranking model in the MLB now is calculated based on the result of a win-loss game, so we assume that a ranking system considering the opponent’s team performance is necessary. Objectives. This study aims to suggest the PageRank algorithm to complement the problem with ranking calculated with winning ratio in calculating team ranking of US MLB. ...

متن کامل

Predicting Fame and Fortune: PageRank or Indegree?

Measures based on the Link Recommendation Assumption are hypothesised to help modern Web search engines rank ‘important, high quality’ pages ahead of relevant but less valuable pages and to reject ‘spam’. We tested these hypotheses using inlink counts and PageRank scores readily obtainable from search engines Google and Fast. We found that the average Google-reported PageRank of websites operat...

متن کامل

DRANK+: A Directory Based Pagerank Prediction Method for Fast Pagerank Convergence

As the increasing of importance in search engines, Internet users change their behavior browsing the Internet little by little. In recent years, most part of search engines use link analysis algorithms to measure the importance of web pages. They employ the conventional flat web graph constructed by web pages and link relation of web pages to measure the relative importance of web pages. The mo...

متن کامل

Integrating the Microbiome and Metabolome Using Datamining and Network Analysis Approaches

This document describes a study on the integration of microbial 16S rRNA gene data with targeted metabolomics data, in the context of bacterial vaginosis (BV), to investigate microbial traits and metabolic particularities involved with this disorder. Multiple computational methods were applied, ranging from simple statistical tests to different applications of decision trees and most importantl...

متن کامل

Predicting the Fluency of Text with Shallow Structural Features: Case Studies of Machine Tanslation and Human-Written Text

Sentence fluency is an important component of overall text readability but few studies in natural language processing have sought to understand the factors that define it. We report the results of an initial study into the predictive power of surface syntactic statistics for the task; we use fluency assessments done for the purpose of evaluating machine translation. We find that these features ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007