A Sober Look at Clustering Stability
نویسندگان
چکیده
Stability is a common tool to verify the validity of sample based algorithms. In clustering it is widely used to tune the parameters of the algorithm, such as the number k of clusters. In spite of the popularity of stability in practical applications, there has been very little theoretical analysis of this notion. In this paper we provide a formal definition of stability and analyze some of its basic properties. Quite surprisingly, the conclusion of our analysis is that for large sample size, stability is fully determined by the behavior of the objective function which the clustering algorithm is aiming to minimize. If the objective function has a unique global minimizer, the algorithm is stable, otherwise it is unstable. In particular we conclude that stability is not a well-suited tool to determine the number of clusters it is determined by the symmetries of the data which may be unrelated to clustering parameters. We prove our results for center-based clusterings and for spectral clustering, and support our conclusions by many examples in which the behavior of stability is counter-intuitive.
منابع مشابه
Prediction of slope stability using adaptive neuro-fuzzy inference system based on clustering methods
Slope stability analysis is an enduring research topic in the engineering and academic sectors. Accurate prediction of the factor of safety (FOS) of slopes, their stability, and their performance is not an easy task. In this work, the adaptive neuro-fuzzy inference system (ANFIS) was utilized to build an estimation model for the prediction of FOS. Three ANFIS models were implemented including g...
متن کاملImproving Vehicular Ad-Hoc Network Stability Using Meta-Heuristic Algorithms
Vehicular ad-hoc network (VANET) is an important component of intelligent transportation systems, in which vehicles are equipped with on-board computing and communication devices which enable vehicle-to-vehicle communication. Consequently, with regard to larger communication due to the greater number of vehicles, stability of connectivity would be a challenging problem. Clustering technique as ...
متن کاملDrunk personality: reports from drinkers and knowledgeable informants.
Existing literature supports the five-factor model (FFM) of personality (i.e., Extraversion, Agreeableness, Conscientiousness, Emotional Stability, and Intellect or Openness) as a comprehensive representation of stable aspects of mood, affect, and behavior. This study evaluated the FFM as a framework for both self-perceptions of drunkenness (i.e., individual changes in mood, affect, and behavio...
متن کاملA Sub-Optimal Look-Up Table Based on Fuzzy System to Enhance the Reliability of Coriolis Mass Flow Meter
Coriolis mass flow meters are one of the most accurate tools to measure the mass flow in the industry. However, two-phase mode (gas-liquid) may cause severe operating difficulties as well as decreasing certitude in measurement. This paper presents a method based on fuzzy systems to correct the error and improve the reliability of these sensors in the presence of two-phase model fluid. Definite ...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006