Supervised Clustering of Label Ranking Data
نویسندگان
چکیده
In this paper we study supervised clustering in the context of label ranking data. Segmentation of such complex data has many potential real-world applications. For example, in target marketing, the goal is to cluster customers in the feature space by taking into consideration the assigned, potentially incomplete product preferences, such that the preferences of instances within a cluster are more similar than the preferences of customers in the other clusters. We establish several heuristic baselines for this application that make use of well-known algorithms such as K-means, and propose a principled algorithm specifically tailored for this type of clustering. It is based on the PlackettLuce (PL) probabilistic ranking model. Each cluster is represented as a union of Voronoi cells defined by a set of prototypes and is assigned a set of PL label scores that determine the cluster-specific label ranking. The unknown cluster PL parameters and prototype positions are determined using a supervised learning technique. Cluster membership and ranking for a new instance is determined by membership of its nearest prototype. The proposed algorithms were empirically evaluated on synthetic and reallife label ranking data. The PL-based method was superior to the heuristically-based supervised clustering approaches. The proposed PL-based algorithm was also evaluated on the task of label ranking prediction. The results showed that it is highly competitive to the state of the art label ranking algorithms, and that it is particularly accurate on data with
منابع مشابه
Estimating Maximally Probable Constrained Relations by Mathematical Programming
Estimating a constrained relation is a fundamental problem in machine learning. Special cases are classification (the problem of estimating a map from a set of to-be-classified elements to a set of labels), clustering (the problem of estimating an equivalence relation on a set) and ranking (the problem of estimating a linear order on a set). We contribute a family of probability measures on the...
متن کاملInformation Retrieval Using Label Propagation Based Ranking
The IR group participated in the crosslanguage retrieval task (CLIR) at the sixth NTCIR workshop (NTCIR 6). In this paper, we describe our approach on Chinese Single Language Information Retrieval (SLIR) task and English-Chinese Bilingual CLIR task (BLIR). We use both bi-grams and single Chinese characters as index units and use OKAPI BM25 as retrieval model. The initial retrieved documents are...
متن کاملUsing Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council
Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...
متن کاملUsing Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council
Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012