Ignoring Irrelevant Pages in Weighted PageRank Algorithm using Text Content of the Target PageIgnoring Irrelevant Pages in Weighted PageRank Algorithm using Text Content of the Target Page
نویسندگان
چکیده
The web is expanding day-by-day and people generally rely on search engines to explore the web. The web has created many challenges for information retrieval. Degree of quality of the information extracted is one of the major issue to be taken care of, and current information retrieval approaches need to be modified to meet such challenges. While doing query based searching, the search engines return a list of web documents containing both relevant and irrelevant pages and sometimes show the higher ranking to the irrelevant pages as compared to relevant pages. This paper presents a novel approach to ignore irrelevant pages in weighted pagerank algorithm using text content of the targeted pages. General Terms Web Page Ranking for information retrieval
منابع مشابه
Associated Pagerank: A Content Relevance Weighted Pagerank Algorithm
Pagerank algorithm is a link analysis approach to evaluate the importance of web pages, and there are many techniques to improve the traditional Pagerank algorithm to prevent from the biases of link spamming in recent years. A key challenge for link analysis is to identify the relevance between the original page and the linked page. The importance scores of web pages should rely on the quality ...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملWeighted Page Rank Algorithm Based on Number of Visits of Links of Web Page
The World Wide Web consists billions of web pages and hugs amount of information available within web pages. To retrieve required information from World Wide Web, search engines perform number of tasks based on their respective architecture. When a user refers a query to the search engine, it generally returns a large number of pages in response to user’s query. To support the users to navigate...
متن کاملA Score based Web Page Ranking Algorithm
With the explosive growth of information in the Web, users face difficulties while finding their desired information. Search engine helps the user by retrieving useful information from this huge collection based on his/her search query and presents a list of relevant web pages as a search result. However, without proper ranking of pages in the result through the relevancy of pages to the search...
متن کاملWeighted PageRank using the Rank Improvement
Information available on the WWW, users’ get easily lost in rich hyper structure. It has become increasingly necessary for user’s to utilize automated tool in order to find, extract, filter and evaluate the desired information and resources. Modern Information Retrieval System matches the term of a user with documents in their index and returns a large number of documents of Web pages generally...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014