A Genetic Algorithm for Simplifying the Amino Acid Alphabet in Bioinformatics Applications

نویسندگان

  • Matthew Palensky
  • Hesham H. Ali
چکیده

Simplified amino acid alphabets have been successful in several areas of bioinformatics, including predicting protein structure, predicting protein function, and protein classification. Since the number of possible simplifications is large, it is not practical to search through all possible simplifications to find one suitable for a specific application. A previous study conducted by the authors indicate that algorithms with heavy reliance on randomness tend to produce poor simplifications. Genetic algorithms have been generally successful in producing quality solutions to problems with a large solution space, though their reliance on randomness makes it difficult to create quality simplifications. This study’s goal is to overcome these difficulties, and create a genetic simplification algorithm. The presented results include the genetic simplification algorithm, as well as the difficulties of creating such an algorithm. The described algorithm has led to the development of a computer program that uses a genetic algorithm to produce simplified alphabets, and these outputs are listed and analyzed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices

MOTIVATION Protein and DNA are generally represented by sequences of letters. In a number of circumstances simplified alphabets (where one or more letters would be represented by the same symbol) have proved their potential utility in several fields of bioinformatics including searching for patterns occurring at an unexpected rate, studying protein folding and finding consensus sequences in mul...

متن کامل

Comparing the Bidirectional Baum-Welch Algorithm and the Baum-Welch Algorithm on Regular Lattice

A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the b...

متن کامل

Optimizing amino acid groupings for GPCR classification

MOTIVATION There is much interest in reducing the complexity inherent in the representation of the 20 standard amino acids within bioinformatics algorithms by developing a so-called reduced alphabet. Although there is no universally applicable residue grouping, there are numerous physiochemical criteria upon which one can base groupings. Local descriptors are a form of alignment-free analysis, ...

متن کامل

Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction Biological Applications Track

This paper focuses on automated procedures to reduce the dimensionality of protein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits of this procedure are faster and easier learning process and generation of more compact and human-readable solutions. This simplification consists of an alphabet reduction procedu...

متن کامل

Sequential and Mixed Genetic Algorithm and Learning Automata (SGALA, MGALA) for Feature Selection in QSAR

Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as: GA, PSO, ACO, SA and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003