Approaches to the Label-Switching Problem of Classification, Based on Partition-Space Relabeling and Label-Invariant Visualization
نویسنده
چکیده
In the context of interest, a method of cluster analysis is used to classify a set of units into a fixed number of classes. Simulation procedures with various conceptual foundations may be used to evaluate uncertainty, stability, or sampling error of such a classification. However simulation approaches may be subject to a label-switching problem, when a likelihood function, posterior density, or some objective function is invariant under permutation of class labels. We suggest a relabeling algorithm that maximizes a simple measure of agreement among classifications. However, it is known that effective summaries and visualization tools can be based on sample concurrence fractions, which we define as sample fractions with given pairs of units falling in the same cluster, and which are invariant under permutation of class labels. We expand the study of concurrence fractions by presenting a matrix theory, which is employed in relabeling, as well as in elaboration of visualization tools. We explore an ordination approach treating concurrence fractions as similarities between pairs of units. A matrix result supports straightforward application of the method of principal coordinates, leading to ordination plots in which Euclidean distances between pairs of units have a simple relationship to concurrence fractions. The use of concurrence fractions complements relabeling, by providing an efficient initial labeling.
منابع مشابه
MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملAdaptive Metropolis with Online Relabeling
We propose a novel adaptive MCMC algorithm named AMOR (Adaptive Metropolis with Online Relabeling) for efficiently simulating from permutation-invariant targets occurring in, for example, Bayesian analysis of mixture models. An important feature of the algorithm is to tie the adaptation of the proposal distribution to the choice of a particular restriction of the target to a domain where label ...
متن کاملCommunity Detection using a New Node Scoring and Synchronous Label Updating of Boundary Nodes in Social Networks
Community structure is vital to discover the important structures and potential property of complex networks. In recent years, the increasing quality of local community detection approaches has become a hot spot in the study of complex network due to the advantages of linear time complexity and applicable for large-scale networks. However, there are many shortcomings in these methods such as in...
متن کاملExploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملHow Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?
We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd. We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this pur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006