similarity measurement web mining

A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

2015

R. Karthikeyan V. Udhayakumar

easuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. We propose an ...

متن کامل

Clustering Web Pages into Hierarchical Categories

Journal: :IJIIT 2007

Zhongmei Yao Ben Choi

Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain, until now there is no such a method suitable for Web page clustering. To address this problem...

متن کامل

Focused Web Crawling Using Decay Concept and Genetic Programming

2011

Mahdi Bazarganigilani Ali Syed

The ongoing rapid growth of web information is a theme of research in many papers. In this paper, we introduce a new optimized method for web crawling. Using genetic programming enhances the accuracy of simialrity measurement. This measurement applies to different parts of the web pages including the title and the body. Consequently, the crawler uses such optimized similarity measurement to tra...

متن کامل

A Web Search Engine-based Approach to Measure Semantic Similarity between Words

2012

N. Shanthi K. S. Rangasamy

Measuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. We propose an...

متن کامل

ارائه روشی کارا برای ترکیب کردن وب سرویس ها

پایان نامه :وزارت علوم، تحقیقات و فناوری - دانشگاه شیراز - دانشکده مهندسی 1388

علی بیگلری, محمدهادی صدرالدینی,

چکیده ندارد.

15 صفحه اول

Measuring the Structural Similarity of Web-based Documents: A Novel Approach

2006

Matthias Dehmer Frank Emmert Jürgen Kilian

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...

متن کامل

Automatically Discovering the Number of Clusters in Web Page Datasets

2005

Zhongmei Yao Ben Choi

Clustering is well suited for Web mining by automatically organizing Web pages into categories each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attem...

متن کامل

Microsoft Word - CONTENTS-AUGUST07

2012

Matthias Dehmer Frank Emmert Jürgen Kilian

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so c...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

Journal: International Journal of Industrial Engineering and Productional Research- 2010

One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

A New Similarity Metric for Sequential Data

Journal: :IJDWM 2010

Pradeep Kumar Raju S. Bapi P. Radha Krishna

In many data mining applications, both classification and clustering algorithms require a distance/similarity measure. The central problem in similarity based clustering/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this chapter...

متن کامل