Discovering and Mining User Web-page Traversal Patterns
نویسندگان
چکیده
As the popularity of WWW explodes, a massive amount of data is gathered by Web servers in the form of Web access logs. This is a rich source of information for understanding Web user surfing behavior. Web Usage Mining, also known as Web Log Mining, is an application of data mining algorithms to Web access logs to find trends and regularities in Web users' traversal patterns. The results of Web Usage Mining have been used in improving Web site design, business and marketing decision support, user profiling, and Web server system performance. In this thesis we study the application of assisted exploration of OLAP data cubes and scalable sequential pattern mining algorithms to Web log analysis. In multidimensional OLAP analysis, standard statistical measures are applied to assist the user at each step to explore the interesting parts of the cube. In addition, a scalable sequential pattern mining algorithm is developed to discover commonly traversed paths in large data sets. Our experimental and performance studies have demonstrated the effectiveness and efficiency of the algorithm in comparison to previously developed sequential pattern mining algorithms. In conclusion, some further research avenues in web usage mining are identified as well. iv Dedication To my parents v Acknowledgments I would like to thank my supervisor Dr. Jiawei Han for his support, sharing of his knowledge and the opportunities that he gave me. His dedication and perseverance has always been exemplary to me. I am also grateful to TeleLearning for getting me started in Web Log Analysis.
منابع مشابه
Web Users Session Analysis Using DBSCAN and Two Phase Utility Mining Algorithms
One of the important issues in data mining is the interestingness problem. Typically, in a data mining process, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, utility measures have been used to reduce the patterns prior to presenting them to the user. A frequent itemset only reflects the statistical ...
متن کاملFrequent Pattern Mining in Web Log Data
Frequent pattern mining is a heavily researched area in the field of data mining with wide range of applications. One of them is to use frequent pattern discovery methods in Web log data. Discovering hidden information from Web log data is called Web usage mining. The aim of discovering frequent patterns in Web log data is to obtain information about the navigational behavior of the users. This...
متن کاملA Valid Candidate Approach to Mining Bi-Directional Traversal Patterns on the WWW
Mining traversal patterns is one of important topics in Web mining. It focuses on how to find the Web page sequences which are frequently browsed by users. In this paper, we propose two algorithms for mining traversal patterns. For the first algorithm, SpeedTracer*-I, it is a revised version of the SpeedTracer algorithm. It directly generates and counts all candidate patterns from user sessions...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملClickstreams, The Basis to Establish User Navigation Patterns on Web Sites
Collecting and mining clickstream data from e-commerce sites has become increasingly important for marketing, advertising, and traffic analysis activities. Organizations are promoting many initiatives concerning user’s navigation pattern discovering, in order to implement better sites, more functional and close to customers’ needs. Basically, the main idea is to provide more quality of attendan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001