Validating clustering for gene expression data

نویسندگان

Ka Yee Yeung

David R. Haynor

Walter L. Ruzzo

چکیده

MOTIVATION Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance. RESULTS We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Learning Statistical and Geometric Models from Microarray Gene Expression Data

Analysis of microarray gene expression data is important for disease study at the molecular and genomic level. Computational data modeling and analysis are essential for extracting meaningful and specific information from noisy, high-throughput, and large-scale microarray gene expression data. In this dissertation, we propose and develop innovative data modeling and analysis methods for learnin...

متن کامل

An Efficient Approach to Identifying and Validating Clusters in Multivariate Datasets with Applications in Gene Expression Analysis

Gene expression data analysis has become an important topic in bioinformatics due to its wide application in the biomedical industry. Effective analysis of gene expression data is an essential part of various data mining methods, especially the clustering techniques. Various kinds of clustering methods have been proposed, yet they do not satisfy for the requirements of high efficiency, high qua...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Bioinformatics

دوره 17 4 شماره

صفحات -

تاریخ انتشار 2001

Validating clustering for gene expression data

نویسندگان

چکیده

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Learning Statistical and Geometric Models from Microarray Gene Expression Data

An Efficient Approach to Identifying and Validating Clusters in Multivariate Datasets with Applications in Gene Expression Analysis

عنوان ژورنال:

اشتراک گذاری