Designing HMMs: Motif discovery and modeling

نویسنده

  • Dannie Durand
چکیده

Position Specific Scoring Matrices capture the distribution of residues observed in each position in a conserved motif, but are not a good model for variable length motifs, recognition of new instances with insertions and deletions, and positional dependencies. Moreover, PSSMs can be used to search for instances of an ungapped motif in an unlabeled sequence, but do not lend themselves to precise boundary detection. We turned to Hidden Markov models to address these limitations. HMMs provide a flexible and expressive formalism for modeling conserved sequence motifs. In addition to modeling precise conserved motifs, like the WEIRD motif, HMMs can also be used to model biologically distinct regions that are characterized by a change in underlying sequence composition, rather than a precise pattern. Examples of these include transmembrane regions, which are enriched for hydrophobic residues, and CpG islands, which have higher GC content.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Meta-MEME: motif-based hidden Markov models of protein families

MOTIVATION Modeling families of related biological sequences using Hidden Markov models (HMMs), although increasingly widespread, faces at least one major problem: because of the complexity of these mathematical models, they require a relatively large training set in order to accurately recognize a given family. For families in which there are few known sequences, a standard linear HMM contains...

متن کامل

The Effects of Ordered-Series-of-Motifs Anchoring and Sub-Class Modeling on the Generation of HMMs Representing Highly Divergent Protein Sequences

Hidden Markov Models (HMMs) provide a flexible method for representing protein sequence data. Highly divergent data require a more complex approach to HMM generation than previously demonstrated. We describe a strategy of motif anchoring and sub-class modeling that aids in the construction of more informative HMMs as determined by a new algorithm called a stability measure.

متن کامل

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

Is Lead Investigator on One National Peer-reviewed Grant: Yes Grant Information: Nih R01 Eb007057 Machine Learning Analysis of Tandem Mass Spectra 3/1/07--2/28/11

Motivation: Modeling families of related biological sequencesusing Hidden Markov models (HMMs), although increasinglywidespread, faces at least one major problem: because of thecomplexity of these mathematical models, they require arelatively large training set in order to accurately recognize agiven family. For families in which there are few knownsequences, a standard ...

متن کامل

HH-MOTiF: de novo detection of short linear motifs in proteins by Hidden Markov Model comparisons

Short linear motifs (SLiMs) in proteins are self-sufficient functional sequences that specify interaction sites for other molecules and thus mediate a multitude of functions. Computational, as well as experimental biological research would significantly benefit, if SLiMs in proteins could be correctly predicted de novo with high sensitivity. However, de novo SLiM prediction is a difficult compu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015