Conditional Information Bottleneck Clustering

نویسندگان

  • David Gondek
  • Thomas Hofmann
چکیده

We present an extension of the well-known information bottleneck framework, called conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score. This general approach can be utilized in a data mining context to extract relevant information that is at the same time novel relative to known properties or structures of the data. We present possible applications of the conditional information bottleneck in information retrieval and text mining for recovering non-redundant clustering solutions, including experimental results on the WebKB data set which validate the approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interpreting Classifiers by Multiple Views

Next to prediction accuracy, interpretability is one of the fundamental performance criteria for machine learning. While high accuracy learners have intensively been explored, interpretability still poses a difficult problem. To combine accuracy and interpretability, this paper introduces an framework which combines an approximative model with a severely restricted number of features with a mor...

متن کامل

The information bottleneck and geometric clustering

The information bottleneck (IB) approach to clustering takes a joint distribution P (X,Y ) and maps the data X to cluster labels T which retain maximal information about Y (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions P (Y | X). This is in contrast to classic “geometric clustering” algorithms ...

متن کامل

Information Bottleneck Co-clustering

Co-clustering has emerged as an important approach for mining contingency data matrices. We present a novel approach to co-clustering based on the Information Bottleneck principle, called Information Bottleneck Co-clustering (IBCC), which supports both soft-partition and hardpartition co-clusterings, and leverages an annealing-style strategy to bypass local optima. Existing co-clustering method...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

An Analysis of Model-based Clustering, Competitive Learning, and Information Bottleneck

This paper provides a general formulation of probabilistic model-based clustering with deterministic annealing (DA), which leads to a unifying analysis of k-means, EM clustering, soft competitive learning algorithms (e.g., self-organizing map), and information bottleneck. The analysis points out an interesting yet not well-recognized connection between the k-means and EM clustering—they are jus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003