A joint optimization framework integrated with biological knowledge for clustering incomplete gene expression data

نویسندگان

چکیده

Clustering algorithms have been successfully applied to identify co-expressed gene groups from expression data. Missing values often occur in data, which presents a challenge for clustering. When partitioning incomplete data into groups, missing value imputation and clustering are generally performed as two separate processes. These two-stage methods likely result unsuitable task unsatisfying performance. This paper proposes multi-objective joint optimization framework that addresses this problem. The proposed can impute the under guidance of clustering, therefore realize synergistic improvement In addition, similarity semantic extracted Gene Ontology combined, form functional neighbor interval each value, provide reasonable constraints framework. experiments carried out on several benchmark sets. terms average rate over sets different rates, our reduce error by 6.4–14.7% increase accuracy 4.0–10.1% compared with six popular promising methods. Furthermore, biological significance identified clusters is reported evaluate effectiveness

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

A New Framework for Co-clustering of Gene Expression Data

A new framework is proposed to study the co-clustering of gene expression data. This framework is based on a generic tensor optimization model and an optimization method termed Maximum Block Improvement (MBI) recently developed in [3]. Not only can this framework be applied for co-clustering gene expression data with genes expressed at different conditions represented in 2D matrices, but it can...

متن کامل

Subspace clustering of gene expression data with prior knowledge

The subspace clustering such as Biclustering has been researched for finding genes activated on specific conditions or specific cell cycles, whose gene expression levels are highly correlated only under the active conditions.However, existing methods have problems of the lack of cluster reliability caused from over-fitting and the difficulty to interpret the clusters because of the generation o...

متن کامل

Incorporating heterogeneous biological data sources in clustering gene expression data

In this paper, a similarity measure between genes with protein-protein interactions is proposed. The chip-chip data are converted into the same form of gene expression data with pearson correlation as its similarity measure. On the basis of the similarity measures of proteinprotein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure ...

متن کامل

Multi-objective optimization for clustering 3-way gene expression data

The microarray technology allows to monitor the expression level of thousands of genes simultaneously. A typical experiment will for example compare gene expression between multiple biological samples such as tumor biopsies, or a single sample in response to a treatment over time. It is assumed that genes with similar function or sharing regulatory elements will display a common expression prof...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Soft Computing

سال: 2022

ISSN: ['1433-7479', '1432-7643']

DOI: https://doi.org/10.1007/s00500-022-07180-y