A new semiempirical codon substitution model based on principal component analysis of mammalian sequences.
نویسندگان
چکیده
Codon substitution models have traditionally been parametric Markov models, but recently, empirical and semiempirical models also have been proposed. Parametric codon models are typically based on 61×61 rate matrices that are derived from a small number of parameters. These parameters are rooted in experience and theoretical considerations and generally show good performance but are still relatively arbitrary. We have previously used principal component analysis (PCA) on data obtained from mammalian sequence alignments to empirically identify the most relevant parameters for codon substitution models, thereby confirming some commonly used parameters but also suggesting new ones. Here, we present a new semiempirical codon substitution model that is directly based on those PCA results. The substitution rate matrix is constructed from linear combinations of the first few (the most important) principal components with the coefficients being free model parameters. Thus, the model is not only based on empirical rates but also uses the empirically determined most relevant parameters for a codon model to adjust to the particularities of individual data sets. In comparisons against established parametric and semiempirical models, the new model consistently achieves the highest likelihood values when applied to sequences of vertebrates, which include the taxonomic class where the model was trained on.
منابع مشابه
Sparse Structured Principal Component Analysis and Model Learning for Classification and Quality Detection of Rice Grains
In scientific and commercial fields associated with modern agriculture, the categorization of different rice types and determination of its quality is very important. Various image processing algorithms are applied in recent years to detect different agricultural products. The problem of rice classification and quality detection in this paper is presented based on model learning concepts includ...
متن کاملStatistical comparison of nucleotide, amino acid, and codon substitution models for evolutionary analysis of protein-coding sequences.
Statistical models for the evolution of molecular sequences play an important role in the study of evolutionary processes. For the evolutionary analysis of protein-coding sequences, 3 types of evolutionary models are available: 1) nucleotide, 2) amino acid, and 3) codon substitution models. Selecting appropriate models can greatly improve the estimation of phylogenies and divergence times and t...
متن کاملOn convergence of sample and population Hilbertian functional principal components
In this article we consider the sequences of sample and population covariance operators for a sequence of arrays of Hilbertian random elements. Then under the assumptions that sequences of the covariance operators norm are uniformly bounded and the sequences of the principal component scores are uniformly sumable, we prove that the convergence of the sequences of covariance operators would impl...
متن کاملFull Length Characterization of PA Gene of H9N2 Isolated from Broilers During 1998 to 2009
Background and Aims: Avian Influenza (AI) H9N2 subtype was first reported to infect turkeys in the United States in 1966 and has been panzootic in Europe and Asia. The impact of avian influenza caused by H9N2 viruses in Iran is now significantly more severe than in previous years. Methods: Sequence analysis and phylogenetic study of the complete coding region Polymerase A (PA) gene of H9N2 subt...
متن کاملPrediction of mineral deposit model and identification of mineralization trend in depth using frequency domain of surface geochemical data in Dalli Cu-Au porphyry deposit
In this research work, the frequency domain (FD) of surface geochemical data was analyzed to decompose the complex geochemical patterns related to different depths of the mineral deposit. In order to predict the variation in mineralization in the depth and identify the deep geochemical anomalies and blind mineralization using the surface geochemical data for the Dalli Cu-Au porphyry deposit, a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Molecular biology and evolution
دوره 29 1 شماره
صفحات -
تاریخ انتشار 2012