Visualisation of heterogeneous data with simultaneous feature saliency using Generalised Generative Topographic Mapping
نویسندگان
چکیده
Most machine-learning algorithms are designed for datasets with features of a single type whereas very little attention has been given to datasets with mixed-type features. We recently proposed a model to handle mixed types with a probabilistic latent variable formalism. This proposed model describes the data by type-specific distributions that are conditionally independent given the latent space and is called generalised generative topographic mapping (GGTM). It has often been observed that visualisations of high-dimensional datasets can be poor in the presence of noisy features. In this paper we therefore propose to extend the GGTM to estimate feature saliency values (GGTMFS) as an integrated part of the parameter learning process with an expectation-maximisation (EM) algorithm. The efficacy of the proposed GGTMFS model is demonstrated both for synthetic and real datasets.
منابع مشابه
Visualisation of Heterogeneous Data with the Generalised Generative Topographic Mapping
Heterogeneous and incomplete datasets are common in many real-world applications. The probabilistic nature of the Generative Topographic Mapping (GTM), which only handles complete continuous data originally, offers the ability to extend it to also visualise mixed-type and missing data as suggested in (Bishop et al., 1998a). This paper describes this generalisation of GTM and assesses the result...
متن کاملNovel Visualisation Methods for Protein Data
Visualization of high-dimensional data has always been a challenging task. Here we discuss and propose variants of non-linear data projection methods (Generative Topographic Mapping (GTM) and GTM with simultaneous feature saliency (GTM-FS)) that are adapted to be effective on very highdimensional data. The adaptations use log space values at certain steps of the Expectation Maximization (EM) al...
متن کاملTopology-Preserving Mappings for Data Visualisation
We present a family of topology preserving mappings similar to the Self-Organizing Map (SOM) and the Generative Topographic Map (GTM) . These techniques can be considered as a non-linear projection from input or data space to the output or latent space (usually 2D or 3D), plus a clustering technique, that updates the centres. A common frame based on the GTM structure can be used with different ...
متن کاملVisualisation of tree-structured data through generative probabilistic modelling
We present a generative probabilistic model for the topographic mapping of tree structured data. The model is formulated as constrained mixture of hidden Markov tree models. A natural measure of likelihood arises as a cost function that guides the model fitting. We compare our approach with an existing neural-based methodology for constructing topographic maps of directed acyclic graphs. We arg...
متن کاملPreliminary theoretical results on a feature relevance determination method for Generative Topographic Mapping
Feature selection (FS) has long been studied in classification and regression problems, following diverse approaches and resulting on a wide variety of methods, usually grouped as either filters or wrappers. In comparison, FS for unsupervised learning has received far less attention. For many real problems concerning unsupervised multivariate data clustering, FS becomes an issue of paramount im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015