Shifting and scaling patterns from gene expression data
نویسنده
چکیده
MOTIVATION During the last years, the discovering of biclusters in data is becoming more and more popular. Biclustering aims at extracting a set of clusters, each of which might use a different subset of attributes. Therefore, it is clear that the usefulness of biclustering techniques is beyond the traditional clustering techniques, especially when datasets present high or very high dimensionality. Also, biclustering considers overlapping, which is an interesting aspect, algorithmically and from the point of view of the result interpretation. Since the Cheng and Church's works, the mean squared residue has turned into one of the most popular measures to search for biclusters, which ideally should discover shifting and scaling patterns. RESULTS In this work, we identify both types of patterns (shifting and scaling) and demonstrate that the mean squared residue is very useful to search for shifting patterns, but it is not appropriate to find scaling patterns because even when we find a perfect scaling pattern the mean squared residue is not zero. In addition, we provide an interesting result: the mean squared residue is highly dependent on the variance of the scaling factor, which makes possible that any algorithm based on this measure might not find these patterns in data when the variance of gene values is high. The main contribution of this paper is to prove that the mean squared residue is not precise enough from the mathematical point of view in order to discover shifting and scaling patterns at the same time. CONTACT [email protected].
منابع مشابه
Effect of Data Transformation on Residue
Recently, Aguilar-Ruiz [2005] considers a data matrix containing both scaling and shifting factors and shows that the mean squared residue [Cheng and Church, 2000], called RESIDUE(II) in this paper, is useful to discover shifting patterns, but not appropriate to find scaling patterns. This finding draws our attention on the weakness of RESIDUE(II) measure and the need of new approaches to disco...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملA Novel Coherence Measure for Discovering Scaling Biclusters from Gene Expression Data
Biclustering methods are used to identify a subset of genes that are co-regulated in a subset of experimental conditions in microarray gene expression data. Many biclustering algorithms rely on optimizing mean squared residue to discover biclusters from a gene expression dataset. Recently it has been proved that mean squared residue is only good in capturing constant and shifting biclusters. Ho...
متن کاملO-30: Comparing Expression Patterns of Endometrial Genes in Implantation Failures and Recurrent Miscarriages with Fertile Couples Following ICSI/IVF Using in Silico Analysis
Background: To screen and diagnose patients with recurrent abortions and implantation failure after IVF/ICSI, differentially expressed genes of endometrium through DNA microarrays were monitored. Materials and Methods: Microarray expression profile of GSE26787 dataset from GEO database was used to analyze gene expression profiles of 15 endometrial biopsy samples- five from control fertile (CF) ...
متن کاملGSTF1 Gene Expression Analysis in Cultivated Wheat Plants under Salinity and ABA Treatments
Most plants encounter stress such as drought and salinity that adversely affect growth, development and crop productivity. The expression of the gene glutathione-s-transferases (GST) extends throughout various protective mechanisms in plants and allows them to adapt to unfavorable environmental conditions. GSTF1 (the first phi GSTFs class) gene expression patterns in the wheat cultivars Mahuti ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 21 20 شماره
صفحات -
تاریخ انتشار 2005