Personal Name Disambiguation in Web Search Results Based on a Semi-supervised Clustering Approach
نویسندگان
چکیده
Most of the previous works that disambiguate personal names in Web search results often employ agglomerative clustering approaches. In contrast, we have adopted a semi-supervised clustering approach in order to guide the clustering more appropriately. Our proposed semi-supervised clustering approach is novel in that it controls the fluctuation of the centroid of a cluster, and achieved a purity of 0.72 and inverse purity of 0.81, and their harmonic mean F was 0.76.
منابع مشابه
Determine the Entity Number in Hierarchical Clustering for Web Personal Name Disambiguation
An internet user is often frustrated by the ambiguous names in the web search results when the user is trying to find information about some person. Hierarchical clustering methods are often used to cluster the personal names referred to the same entities. As the correct number of the entities for a given personal name can not be accessed, we are required to determine the cut points in the dend...
متن کاملTITPI: Web People Search Task Using Semi-Supervised Clustering Approach
Most of the previous works that disambiguate personal names in Web search results employ agglomerative clustering approaches. However, these approaches tend to generate clusters that contain a single element depending on a certain criterion of merging similar clusters. In contrast to such previous works, we have adopted a semisupervised clustering approach to integrate similar documents into a ...
متن کاملClustering web people search results using fuzzy ants
Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. T...
متن کاملAUG: A combined classification and clustering approach for web people disambiguation
This paper presents a combined supervised and unsupervised approach for multidocument person name disambiguation. Based on feature vectors reflecting pairwise comparisons between web pages, a classification algorithm provides linking information about document pairs, which leads to initial clusters. In addition, two different clustering algorithms are fed with matrices of weighted keywords. In ...
متن کاملSemi-supervised Clustering for Word Instances and Its Effect on Word Sense Disambiguation
We propose a supervised word sense disambiguation (WSD) system that uses features obtained from clustering results of word instances. Our approach is novel in that we employ semi-supervised clustering that controls the fluctuation of the centroid of a cluster, and we select seed instances by considering the frequency distribution of word senses and exclude outliers when we introduce “must-link”...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007