Multiple String Alignment
نویسندگان
چکیده
Every DNA molecule can be described, avoiding its tridimensional structure, as a string (genome) of elements from a set of cardinality only four, whose elements (basis) can be listed as A, C, G, T. Sequences of DNA are repeated many times through the genome without yet understood biological function. DNA is in every living cells and whenever a cell duplicates itself every new offspring get a complete copy of the original DNA. During these replication events mismatches may happen, due to insertions, deletions or substitutions (actually also finite replications may happen, but these can be seen as multiple insertions). Biological experimental research has actually shown that not every region of the DNA has the same probability to be object of a change. This is due to the supposed purpose of every single region and to evolutionary reasons (a change in a region that does not effect macroscopical properties would be more likely). We will anyhow avoid all the considerations of this kind. This simple algebraic representation can also be applied to other common biological structures such as amminoacids, with different sets of base elements. In order to highlight the similarities and differences among the instances of such strings we want to define a good method of comparison. To do so we start from comparison between two strings, but first of all we need some definitions. Note that Σ is any given alphabet and Σ is the set of every finite string on it.
منابع مشابه
Grammar string: a novel ncRNA secondary structure representation
Multiple ncRNA alignment has important applications in homologous ncRNA consensus structure derivation, novel ncRNA identification, and known ncRNA classification. As many ncRNAs’ functions are determined by both their sequences and secondary structures, accurate ncRNA alignment algorithms must maximize both sequence and structural similarity simultaneously, incurring high computational cost. F...
متن کاملMultiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment
We investigate multiple many-to-many alignments as a primary step in integrating supplemental information strings in string transduction. Besides outlining DP based solutions to the multiple alignment problem, we detail an approximation of the problem in terms of multiple sequence segmentations satisfying a coupling constraint. We apply our approach to boosting baseline G2P systems using homoge...
متن کاملSequence Alignment as a Database Technology Challenge
Sequence alignment is an important task for molecular biologists. Because alignment basically deals with approximate string matching on large biological sequence collections, it is both data intensive and computationally complex. There exist several tools for the variety of problems related to sequence alignment. Our first observation is that the term ’sequence database’ is used in general for ...
متن کاملMultiple Sequence Alignments in Linguistics
In this study we apply and evaluate an iterative pairwise alignment program for producing multiple sequence alignments, ALPHAMALIG (Alonso et al., 2004), using as material the phonetic transcriptions of words used in Bulgarian dialectological research. To evaluate the quality of the multiple alignment, we propose two new methods based on comparing each column in the obtained alignments with the...
متن کاملNP - and MAX SNP - hardness ofMultiple Sequence Tree
This communication gives a short proof of the NP-and MAX-SNP hardness of the multiple sequence tree alignment problem. As such, it gives the rst available proof of the NP-hardness result claimed in (Warnow 1993) and greatly simpliies the MAX SNP-hardness proof given in (Wang and Jiang 1994). Though there are many types of multiple sequence alignment (see (Chan et al. 1994) and references), tree...
متن کامل1 Introduction to Sequence Alignment
The natural occurrence of strings in our daily lives motivates several applications where discovering the similarities have great significance and utility. The definition of a string is different based on the field of study and context of the experiment we are conducting. There are many prevalent problems that can be solved by discovering underlying structure in strings, both continuous and dis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004