Supervised Clustering of Label Ranking Data

نویسندگان

Mihajlo Grbovic

Nemanja Djuric

Slobodan Vucetic

چکیده

In this paper we study supervised clustering in the context of label ranking data. Segmentation of such complex data has many potential real-world applications. For example, in target marketing, the goal is to cluster customers in the feature space by taking into consideration the assigned, potentially incomplete product preferences, such that the preferences of instances within a cluster are more similar than the preferences of customers in the other clusters. We establish several heuristic baselines for this application that make use of well-known algorithms such as K-means, and propose a principled algorithm specifically tailored for this type of clustering. It is based on the PlackettLuce (PL) probabilistic ranking model. Each cluster is represented as a union of Voronoi cells defined by a set of prototypes and is assigned a set of PL label scores that determine the cluster-specific label ranking. The unknown cluster PL parameters and prototype positions are determined using a supervised learning technique. Cluster membership and ranking for a new instance is determined by membership of its nearest prototype. The proposed algorithms were empirically evaluated on synthetic and reallife label ranking data. The PL-based method was superior to the heuristically-based supervised clustering approaches. The proposed PL-based algorithm was also evaluated on the task of label ranking prediction. The results showed that it is highly competitive to the state of the art label ranking algorithms, and that it is particularly accurate on data with

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Maximally Probable Constrained Relations by Mathematical Programming

Estimating a constrained relation is a fundamental problem in machine learning. Special cases are classification (the problem of estimating a map from a set of to-be-classified elements to a set of labels), clustering (the problem of estimating an equivalence relation on a set) and ranking (the problem of estimating a linear order on a set). We contribute a family of probability measures on the...

متن کامل

Information Retrieval Using Label Propagation Based Ranking

The IR group participated in the crosslanguage retrieval task (CLIR) at the sixth NTCIR workshop (NTCIR 6). In this paper, we describe our approach on Chinese Single Language Information Retrieval (SLIR) task and English-Chinese Bilingual CLIR task (BLIR). We use both bi-grams and single Chinese characters as index units and use OKAPI BM25 as retrieval model. The initial retrieved documents are...

متن کامل

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Supervised clustering is a data mining technique that assigns a set of data to predefined classes by analyzing dataset attributes. It is considered as an important technique for information retrieval, management, and mining in information systems. Since customer satisfaction is the main goal of organizations in modern society, to meet the requirements, 137 call center of Tehran city council is ...

متن کامل

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

متن کامل

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Supervised Clustering of Label Ranking Data

نویسندگان

چکیده

منابع مشابه

Estimating Maximally Probable Constrained Relations by Mathematical Programming

Information Retrieval Using Label Propagation Based Ranking

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

عنوان ژورنال:

اشتراک گذاری