A joint optimization framework integrated with biological knowledge for clustering incomplete gene expression data
نویسندگان
چکیده
Clustering algorithms have been successfully applied to identify co-expressed gene groups from expression data. Missing values often occur in data, which presents a challenge for clustering. When partitioning incomplete data into groups, missing value imputation and clustering are generally performed as two separate processes. These two-stage methods likely result unsuitable task unsatisfying performance. This paper proposes multi-objective joint optimization framework that addresses this problem. The proposed can impute the under guidance of clustering, therefore realize synergistic improvement In addition, similarity semantic extracted Gene Ontology combined, form functional neighbor interval each value, provide reasonable constraints framework. experiments carried out on several benchmark sets. terms average rate over sets different rates, our reduce error by 6.4–14.7% increase accuracy 4.0–10.1% compared with six popular promising methods. Furthermore, biological significance identified clusters is reported evaluate effectiveness
منابع مشابه
A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملA New Framework for Co-clustering of Gene Expression Data
A new framework is proposed to study the co-clustering of gene expression data. This framework is based on a generic tensor optimization model and an optimization method termed Maximum Block Improvement (MBI) recently developed in [3]. Not only can this framework be applied for co-clustering gene expression data with genes expressed at different conditions represented in 2D matrices, but it can...
متن کاملSubspace clustering of gene expression data with prior knowledge
The subspace clustering such as Biclustering has been researched for finding genes activated on specific conditions or specific cell cycles, whose gene expression levels are highly correlated only under the active conditions.However, existing methods have problems of the lack of cluster reliability caused from over-fitting and the difficulty to interpret the clusters because of the generation o...
متن کاملIncorporating heterogeneous biological data sources in clustering gene expression data
In this paper, a similarity measure between genes with protein-protein interactions is proposed. The chip-chip data are converted into the same form of gene expression data with pearson correlation as its similarity measure. On the basis of the similarity measures of proteinprotein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure ...
متن کاملMulti-objective optimization for clustering 3-way gene expression data
The microarray technology allows to monitor the expression level of thousands of genes simultaneously. A typical experiment will for example compare gene expression between multiple biological samples such as tumor biopsies, or a single sample in response to a treatment over time. It is assumed that genes with similar function or sharing regulatory elements will display a common expression prof...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Soft Computing
سال: 2022
ISSN: ['1433-7479', '1432-7643']
DOI: https://doi.org/10.1007/s00500-022-07180-y