Estimating Mutual Information for Discrete-Continuous Mixtures
نویسندگان
چکیده
Estimating mutual information from observed samples is a basic primitive, useful in several machine learning tasks including correlation mining, information bottleneck clustering, learning a Chow-Liu tree, and conditional independence testing in (causal) graphical models. While mutual information is a well-defined quantity in general probability spaces, existing estimators can only handle two special cases of purely discrete or purely continuous pairs of random variables. The main challenge is that these methods first estimate the (differential) entropies of X, Y and the pair (X,Y ) and add them up with appropriate signs to get an estimate of the mutual information. These 3H-estimators cannot be applied in general mixture spaces, where entropy is not well-defined. In this paper, we design a novel estimator for mutual information of discrete-continuous mixtures. We prove that the proposed estimator is consistent. We provide numerical experiments suggesting superiority of the proposed estimator compared to other heuristics of adding small continuous noise to all the samples and applying standard estimators tailored for purely continuous variables, and quantizing the samples and applying standard estimators tailored for purely discrete variables. This significantly widens the applicability of mutual information estimation in real-world applications, where some variables are discrete, some continuous, and others are a mixture between continuous and discrete components.
منابع مشابه
Mutual Information between Discrete and Continuous Data Sets
Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with "binning" when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for...
متن کاملTree Augmented Naive Bayes for Regression Using Mixtures of Truncated Exponentials: Application to Higher Education Management
In this paper we explore the use of Tree Augmented Naive Bayes (TAN) in regression problems where some of the independent variables are continuous and some others are discrete. The proposed solution is based on the approximation of the joint distribution by a Mixture of Truncated Exponentials (MTE). The construction of the TAN structure requires the use of the conditional mutual information, wh...
متن کاملScene continuous mutual information as least upper bound of discrete one
In this report we define the continuous mutual information of scene visibility, independent of whatever discretisation, and we prove that it is the least upper bound of the discrete mutual information. Thus, continuous mutual information can be understood as the maximum information transfer in a scene.
متن کاملResearch of Blind Signals Separation with Genetic Algorithm and Particle Swarm Optimization Based on Mutual Information
Blind source separation technique separates mixed signals blindly without any information on the mixing system. In this paper, we have used two evolutionary algorithms, namely, genetic algorithm and particle swarm optimization for blind source separation. In these techniques a novel fitness function that is based on the mutual information and high order statistics is proposed. In order to evalu...
متن کاملSelective Naive Bayes for Regression Based on Mixtures of Truncated Exponentials
Naive Bayes models have been successfully used in classification problems where the class variable is discrete. These models have also been applied to regression or prediction problems, i.e. classification problems where the class variable is continuous, but usually under the assumption that the joint distribution of the feature variables and the class is multivariate Gaussian. In this paper we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017