similarity measurement web mining

Corpus-based Semantic Class Mining: Distributional vs. Pattern-Based Approaches

2010

Shuming Shi Huibin Zhang Xiaojie Yuan Ji-Rong Wen

Main approaches to corpus-based semantic class mining include distributional similarity (DS) and pattern-based (PB). In this paper, we perform an empirical comparison of them, based on a publicly available dataset containing 500 million web pages, using various categories of queries. We further propose a frequencybased rule to select appropriate approaches for different types of terms.

متن کامل

Integration of Visual Temporal and Textual Distribution Information for News Video Mining

2010

Prof Shivamurthy Tauseef Ahmed

News web videos exhibit several characteristics, including a limited number of features, noisy text information, and error in near-duplicate key frames (NDK) detection. In this paper, a novel framework is proposed to better group the associated web videos to events. First, the data preprocessing stage performs feature selection and tag relevance learning. Next, multiple correspondence analysis ...

متن کامل

Lexical Semantic Association between Web Pages – a Lexical Knowledge Based Method

2002

Xiao Yuan Duan Clive Souter George Demetriou Dong Xie Lynn Li

The candidate confirms that the work submitted is her own and that appropriate credit has been given where reference has been made to the work of others i ACKNOWLEDGEMENT I would like to express my gratitude to my supervisor Eric Atwell, Bill Whyte for helpful commentary and suggestions and also to Clive Souter, George Demetriou who provided me with LKB dictionary and gave me good suggestions. ...

متن کامل

A Layered Locality Sensitive Hashing based Sequence Similarity Search Algorithm for Web Sessions

2014

Angana Chakraborty Sanghamitra Bandyopadhyay

In this article we propose a Layered Locality Sensitive Hashing Algorithm to perform similarity search on the web log sequence data. Locality Sensitive Hashing has been found to be an efficient technique for the approximate nearest neighbor search over a large database, as it has sub-linear dependence on the data size even for high dimension. Mining the large web log data to provide customised ...

متن کامل

A New Model for Measuring Similarity of Web Queries and Its Application in Query Expansion1

2013

Lingling Meng Runqing Huang Junzhong Gu

The similarity of web queries plays an important role in capturing frequently asked questions, most popular topics of search engine or automatic query expansion. Accurate measurement of similarity between queries is crucial. The paper presents a new model for similarity metric of web queries using user logs and applied it into information retrieval for query expansion. Different from previous w...

متن کامل

Measuring the Similarity of Trajectories Using Fuzzy Theory

ژورنال: علوم و فنون نقشه برداری 2020

Alesheikh, A. A., Boroumand, F., Farnaghi, M.,

In recent years, with the advancement of positioning systems, access to a large amount of movement data is provided. Among the methods of discovering knowledge from this type of data is to measure the similarity of trajectories resulting from the movement of objects. Similarity measurement has also been used in other data mining methods such as classification and clustering and is currently, an...

متن کامل

COWES: Clustering Web Users Based on Historical Web Sessions

2006

Ling Chen Sourav S. Bhowmick Jinyan Li

Clustering web users is one of the most important research topics in web usage mining. Existing approaches cluster web users based on the snapshots of web user sessions. They do not take into account the dynamic nature of web usage data. In this paper, we focus on discovering novel knowledge by clustering web users based on the evolutions of their historical web sessions. We present an algorith...

متن کامل

Web Mining: a Comparative Study

2012

Aishwarya Rastogi Smita Gupta Srishti Agarwal Nimisha Agarwal

Currently, World-Wide Web has developed to a distributed information space with nearly 100 million workstations and several billion pages, which brings the people great trouble in finding needed information although huge amount of information available on webs. The search engine is a very important tool for people to obtain information on Internet, but the low-precision and lowrecall exist wide...

متن کامل

Sequence Graph Transform (SGT)

2016

Chitta Ranjan Samaneh Ebrahimi Kamran Paynabar

A ubiquitous presence of sequence data across fields, like, web, healthcare, bioinformatics, text mining, etc., has made sequence mining a vital research area. However, sequence mining is particularly challenging because of absence of an accurate and fast approach to find (dis)similarity between sequences. As a measure of (dis)similarity, mainstream data mining methods like k-means, kNN, regres...

متن کامل

Improving Web Service Clustering through Post Filtering to Bootstrap the Service Discovery

2014

Banage T. G. S. Kumara Incheon Paik Koswatte R. C. Koswatte Wuhui Chen

Web service clustering is one of a very efficient approach to discover Web services efficiently. Current approaches use similarity-distance measurement methods such as string-based, corpus-based, knowledge-based and hybrid methods. These approaches have problems that include discovering semantic characteristics, loss of semantic information, shortage of high-quality ontologies and encoding fine...

متن کامل