A Clustering Method for Discovering Patterns Using Gene Regulatory Processes
نویسندگان
چکیده
Clustering is a descriptive task that seeks to identify homogeneous groups of objects based on the values of their attributes (dimensions). The k-means and hierarchical as well as self-organizing maps have all been used for clustering expression profiles and a number of algorithms have been developed for expression data and applied to analyze it. These Clustering methods usually use metric distance for similarity measure. Correlation coefficient is also used but has a problem that it removes difference attributable to both the mean and the dispersion of the observations. Moreover, it may be unreasonable that every observation is assigned to one of clusters when the purpose is to find groups with similar pattern. Alter et al. [1] show that several significant eigengenes and the corresponding eigenarrays capture most of the expression information in field of genetics and some of the eigengenes represent independent regulatory programs or processes from its expression pattern across all arrays. Normalizing the data by filtering out the eigengenes (and the corresponding eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different expression. Such normalization may improve any further analysis of the expression data. Q-mode factor analysis has been used to find groups like clustering analysis and could be a good method to find patterns. However, this approach to clustering is plagued with a number of problems [3]. Genes with similar expression profiles may have something in common in their regulatory mechanisms. In this study, Q-mode factor analysis is used to model the gene regulatory processes which control genes and gene products and we modify the Q-mode factor analysis for discovering useful patterns in gene expression data. As a result of the factor modeling of gene expression data, our method can improve the result of clustering by removing noises and produce characteristic values of expression data.
منابع مشابه
Discovering biological processes from microarray data using independent component analysis
We propose a hypothesis-free methodology for discovering genome-wide expression patterns specific to underlying biological processes from DNA microarray expression data. We apply linear and nonlinear independent component analysis (ICA) as a tool for decomposing microarray data into statistically independent components. Each component represents a gene expression pattern of a putative underlyin...
متن کاملDiscovering Distinct Patterns in Gene Expression Profiles
Traditional analysis of gene expression profiles use clustering to find groups of coexpressed genes which have similar expression patterns. However clustering is time consuming and could be diffcult for very large scale dataset. We proposed the idea of Discovering Distinct Patterns (DDP) in gene expression profiles. Since patterns showing by the gene expressions reveal their regulate mechanisms...
متن کاملBioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes
The development of genome sequencing and DNA microarray analysis of gene expression gives rise to the demand for data-mining tools. BioProspector, a C program using a Gibbs sampling strategy, examines the upstream region of genes in the same gene expression pattern group and looks for regulatory sequence motifs. BioProspector uses zero to third-order Markov background models whose parameters ar...
متن کاملBRANE Clust: Cluster-Assisted Gene Regulatory Network Inference Refinement.
Discovering meaningful gene interactions is crucial for the identification of novel regulatory processes in cells. Building accurately the related graphs remains challenging due to the large number of possible solutions from available data. Nonetheless, enforcing a priori on the graph structure, such as modularity, may reduce network indeterminacy issues. BRANE Clust (Biologically-Related A pri...
متن کاملModification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001