Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment

نویسندگان

  • Jaebum Kim
  • Saurabh Sinha
چکیده

MOTIVATION A quantitative study of molecular evolutionary events such as substitutions, insertions and deletions from closely related genomes requires (1) an accurate multiple sequence alignment program and (2) a method to annotate the insertions and deletions that explain the 'gaps' in the alignment. Although the former requirement has been extensively addressed, the latter problem has received little attention, especially in a comprehensive probabilistic framework. RESULTS Here, we present Indelign, a program that uses a probabilistic evolutionary model to compute the most likely scenario of insertions and deletions consistent with an input multiple alignment. It is also capable of modifying the given alignment so as to obtain a better agreement with the evolutionary model. We find close to optimal performance and substantial improvement over alternative methods, in tests of Indelign on synthetic data. We use Indelign to analyze regulatory sequences in Drosophila, and find an excess of insertions over deletions, which is different from what has been reported for neutral sequences. AVAILABILITY The Indelign program may be downloaded from the website http://veda.cs.uiuc.edu/indelign/ SUPPLEMENTARY INFORMATION Supplementary material is available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Phylogenetic Inference with Insertions and Deletions

A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for in...

متن کامل

Transducers: an emerging probabilistic framework for modeling indels on trees

When it comes to dealing with indels, molecular evolution lags heuristic bioinformatics by decades. Sophisticated alignment algorithms have been widely known since the 1960s (and in bioinformatics since 1970), but we are still struggling to understand the corresponding phylogenetic models. Big ideas drive change: as we dream of reconstructing ancestral genotypes, it is ever clearer that indels ...

متن کامل

A CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images

Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...

متن کامل

Stochastic models of sequence evolution including insertion-deletion events.

Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computa...

متن کامل

PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats

Identification of dispersed or interspersed repeats, most of which are derived from transposons, retrotransposons or retrovirus-like elements, is an important step in genome annotation. Software tools that compare genomic sequences with precompiled repeat reference libraries using sensitive similarity-based methods provide reliable means of finding the positions of fragments homologous to known...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 23 3  شماره 

صفحات  -

تاریخ انتشار 2007