Markov Chain Analysis of the PageRank Problem

نویسنده

  • Nelly Litvak
چکیده

The PageRank is a notion used by search engines to reflect a popularity and importance of a page based on its citation ranking. Such ranking was first introduced in 1998 by Google search engine [4]. The PageRank of a page i reflects the importance of this page basing on: 1) how many pages link to i, and 2) how important are the pages that link to i. Since the web changes very fast, the PageRank has to be regularly updated. Such update is an intricate task due to the huge size of the World Wide Web. Consequently, the analysis of the PageRank has become a hot topic with vast literature ranging from the original paper by Brin and Page [4], to the latest preprints by specialists in Markov chains, linear algebra, numerical methods, information retrieval, operations research, and other fields [11]. In the proposed PhD project, we shall concentrate on the Markov chain formulation of the PageRank problem. Specifically, we suggest to analyze the effectiveness of aggregationdisaggregation methods [15, 5, 14] in PageRank computation. Such methods exploit the block structure of the web and seem to be very promising [7]. The analysis will be based on the theory of discrete-time Markov chains, Perron-Frobenius theory, perturbation theory [8], and the theory of quasi-stationary distributions [6, 9]. The project will also involve extensive numerical studies. A student may start with the following problem. Consider two completely disconnected communities (blocks of pages) and assume that they tailor several links to each other. Such strategy is called reprocicating and is widely used by web-administrators in hope to increase their ranking [12]. The question is whether the trick really works. In [2], we studied a completely decomposable web and we analyzed the situation when one of communities gives a link to another without receiving a link back. The results insinuate that in case of reprocicating, only one of the communities wins in ranking, whereas the other one looses. This issue however requires a rigorous analysis. The results will be interesting from the practical point of view, as they will either confirm or ruin the common reprocicating myth. Besides, it will be a useful and insightful first step in the analysis of aggregation-disaggregation methods in PageRank computation. After the first problem is been (partly) solved, the direction of research might deviate from the original plan, depending on the used methods, obtained results, and the interests of the student. Possible directions could be, for instance, the analysis of on-line algorithms [1] or Monte Carlo methods [3, 10].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Markov Chain in the PageRank Algorithm

Link analysis algorithms for Web search engines determine the importance and relevance of Web pages. Among the link analysis algorithms, PageRank is the state of the art ranking mechanism that is used in Google search engine today. The PageRank algorithm is modeled as the behavior of a randomized Web surfer; this model can be seen as Markov chain to predict the behavior of a system that travels...

متن کامل

Markov Chain Anticipation for the Online Traveling Salesman Problem by Simulated Annealing Algorithm

The arc costs are assumed to be online parameters of the network and decisions should be made while the costs of arcs are not known. The policies determine the permitted nodes and arcs to traverse and they are generally defined according to the departure nodes of the current policy nodes. In on-line created tours arc costs are not available for decision makers. The on-line traversed nodes are f...

متن کامل

On PageRank Algorithm and Markov Chain Reduction

The PageRank is used by search engines to reflect the popularity and importance of a page based on its reference ranking. Since the web changes very fast, the PageRank has to be regularly updated. Such updates is an challenging task due to the huge size of the World Wide Web. Consequently, the analysis of the PageRank has become a hot topic with vast literature ranging from the original paper b...

متن کامل

Efficient randomized algorithms for PageRank problem

In the paper we compare well known numerical methods of finding PageRank vector. We propose Markov Chain Monte Carlo method and obtain a new estimation for this method. We also propose a new method for PageRank problem based on the reduction of this problem to the matrix game. We solve this (sparse) matrix game with randomized mirror descent. It should be mentioned that we used non-standard ran...

متن کامل

Multilinear PageRank

In this paper, we first extend the celebrated PageRank modification to a higher-order Markov chain. Although this system has attractive theoretical properties, it is computationally intractable for many interesting problems. We next study a computationally tractable approximation to the higher-order PageRank vector that involves a system of polynomial equations called multilinear PageRank. This...

متن کامل

Policy Iteration is well suited to optimize PageRank

The question of knowing whether the policy Iteration algorithm (PI) for solving Markov Decision Processes (MDPs) has exponential or (strongly) polynomial complexity has attracted much attention in the last 50 years. Recently, Fearnley proposed an example on which PI needs an exponential number of iterations to converge. Though, it has been observed that Fearnley’s example leaves open the possib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004