Handbook of Cluster Analysis

نویسندگان

  • C. Hennig
  • Marina Meila
چکیده

Spectral clustering is a family of methods to find K clusters using the eigenvectors of a matrix. Typically, this matrix is derived from a set of pairwise similarities Sij between the points to be clustered. This task is called similarity based clustering, graph clustering, or clustering of diadic data. One remarkable advantage of spectral clustering is its ability to cluster “points” which are not necessarily vectors, and to use for this a“similarity”, which is less restrictive than a distance. A second advantage of spectral clustering is its flexibility; it can find clusters of arbitrary shapes, under realistic separations. This chapter introduces the similarity based clustering paradigm, describes the algorithms used, and sets the foundations for understanding these algorithms. Practical aspects, such as obtaining the similarities are also discussed. 1 2 CHAPTER 1. SPECTRAL CLUSTERING 1.1 Similarity based clustering. Definitions and criteria 1.1.1 What is similarity based clustering? Clusters when the data represent similarities between pairs of points is called similarity based clustering. A typical example of similarity based clustering is community detection in social networks [47] (see also Chapter ??), where the observations are individual links between people, which may be due to friendship, shared interests, work relationships. The “strength” of a link can be the frequency of interactions, e.g. communications by e-mail, phone or other social media, co-authorships or citations. In this clustering paradigm, the points to be clustered are not assumed to be part of a vector space. Their attributes (or features) are incorporated into a single dimension, the link strength, or similarity, which takes a numerical value Sij for each pair of points i, j. Hence, the natural representation for this problem is by means of the similarity matrix S = [Sij ] n i,j=1. The similarities are symmetric (Sij = Sji), and non-negative (Sij ≥ 0). Less obvious domains where similarity based clustering is used include image segmentation, where the points to be clustered are pixels in an image, and text analysis, where words appearing in the same context are considered similar. The goal of similarity based clustering is to find the global clustering of the data set that emerges from the pairwise interactions of its points. Namely, we want to put points that are similar to each other in the same cluster, dissimilar points in different clusters. 1.1.2 Similarity based clustering and cuts in graphs It is useful to cast similarity based clustering in the language of graph theory. Let the points to be clustered V = {1, . . . n} be the nodes of a graph G, and the graph edges be represented by the pairs i, j with Sij > 0. The similarity itself is the weight of edge ij. G = (V,E), E = {(i, j), Sij > 0} ⊆ V × V (1.1) 1.1. SIMILARITY BASED CLUSTERING. DEFINITIONS AND CRITERIA 3 Thus, G is an undirected and weighted graph. A partition of the nodes of a graph into K clusters is known as a (K-way) graph cut, therefore similarity based clustering can be viewed as finding a cut in the graph G. The following definitions will be helpful. We denote di = ∑

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spectral Clustering: a Tutorial for the 2010’s

Spectral clustering is a family of methods to find K clusters using the eigenvectors of a matrix. Typically, this matrix is derived from a set of pairwise similarities Sij between the points to be clustered. This task is called similarity based clustering, graph clustering, or clustering of diadic data. One remarkable advantage of spectral clustering is its ability to cluster “points” which are...

متن کامل

The Maternal and Child Health (MCH) Handbook in Mongolia: A Cluster-Randomized, Controlled Trial

OBJECTIVE To assess the effectiveness of the Maternal and Child Health (MCH) handbook in Mongolia to increase antenatal clinic attendance, and to enhance health-seeking behaviors and other health outcomes. METHODS A cluster randomized trial was conducted using the translated MCH handbook in Bulgan, Mongolia to assess its effectiveness in promoting antenatal care attendance. Pregnant women wer...

متن کامل

Fuzzy Classification on Relational Databases

The fuzzy logic theory proposed by Zadeh (1965) is based on intuitive reasoning and takes into account human subjectivity and imprecision. Unlike statistical data mining techniques such as cluster or regression analysis, fuzzy logic enables the use of nonnumerical values and introduces the notion of linguistic variables (Zadeh, 1975a, 1975b, 1975c). Using linguistic terms and variables hides Ab...

متن کامل

Cluster Analysis: A Toolbox for MATLAB

A broad definition of clustering can be given as the search for homogeneous groupings of objects based on some type of available data. There are two common such tasks now discussed in (almost) all multivariate analysis texts and implemented in the commercially available behavioral and social science statistical software suites: hierarchical clustering and the K-means partitioning of some set of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015