Clustering short time series gene expression data
نویسندگان
چکیده
MOTIVATION Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8 time points or fewer). These datasets present unique challenges. On account of the large number of genes profiled (often tens of thousands) and the small number of time points many patterns are expected to arise at random. Most clustering algorithms are unable to distinguish between real and random patterns. RESULTS We present an algorithm specifically designed for clustering short time series expression data. Our algorithm works by assigning genes to a predefined set of model profiles that capture the potential distinct patterns that can be expected from the experiment. We discuss how to obtain such a set of profiles and how to determine the significance of each of these profiles. Significant profiles are retained for further analysis and can be combined to form clusters. We tested our method on both simulated and real biological data. Using immune response data we show that our algorithm can correctly detect the temporal profile of relevant functional categories. Using Gene Ontology analysis we show that our algorithm outperforms both general clustering algorithms and algorithms designed specifically for clustering time series gene expression data. AVAILABILITY Information on obtaining a Java implementation with a graphical user interface (GUI) is available from http://www.cs.cmu.edu/~jernst/st/ SUPPLEMENTARY INFORMATION Available at http://www.cs.cmu.edu/~jernst/st/
منابع مشابه
Combination of Transformed-means Clustering and Neural Networks for Short-Term Solar Radiation Forecasting
In order to provide an efficient conversion and utilization of solar power, solar radiation datashould be measured continuously and accurately over the long-term period. However, the measurement ofsolar radiation is not available to all countries in the world due to some technical and fiscal limitations. Hence,several studies were proposed in the literature to find mathematical and physical mod...
متن کاملA Fuzzy Approach for Clustering Gene Expression Time Series Data
Identifying groups of genes that manifest similar expression patterns is crucial in the analysis of gene expression time series data. Choosing a similarity measure to determine the similarity or distance between profiles is an important task. Time series expression experiments are used to study a wide range of biological systems. More than 80% of all time series expression datasets are short (8...
متن کاملبه کارگیری روشهای خوشهبندی در ریزآرایه DNA
Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...
متن کاملAnalysis of Short Time Series in Gene Expression Tasks
The article analyzes various clustering approaches that are used in gene expression tasks. The chosen approaches are portrayed and examined from the viewpoint of use of data mining clustering algorithms. The article provides a short description of working principles and characteristics of the examined methods and algorithms and the data sets used in the experiments. The article presents results...
متن کاملClustering Algorithms for Time Series Gene Expression in Microarray Data
illustrations, 75 numbered references. Clustering techniques are important for gene expression data analysis. However, efficient computational algorithms for clustering time-series data are still lacking. This work documents two improvements on an existing profile-based greedy algorithm for short time-series data; the first one is implementation of a scaling method on the pre-processing of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 21 Suppl 1 شماره
صفحات -
تاریخ انتشار 2005