Multiple String Alignment

نویسندگان

Matteo Barigozzi

Paolo Pin

چکیده

Every DNA molecule can be described, avoiding its tridimensional structure, as a string (genome) of elements from a set of cardinality only four, whose elements (basis) can be listed as A, C, G, T. Sequences of DNA are repeated many times through the genome without yet understood biological function. DNA is in every living cells and whenever a cell duplicates itself every new offspring get a complete copy of the original DNA. During these replication events mismatches may happen, due to insertions, deletions or substitutions (actually also finite replications may happen, but these can be seen as multiple insertions). Biological experimental research has actually shown that not every region of the DNA has the same probability to be object of a change. This is due to the supposed purpose of every single region and to evolutionary reasons (a change in a region that does not effect macroscopical properties would be more likely). We will anyhow avoid all the considerations of this kind. This simple algebraic representation can also be applied to other common biological structures such as amminoacids, with different sets of base elements. In order to highlight the similarities and differences among the instances of such strings we want to define a good method of comparison. To do so we start from comparison between two strings, but first of all we need some definitions. Note that Σ is any given alphabet and Σ is the set of every finite string on it.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammar string: a novel ncRNA secondary structure representation

Multiple ncRNA alignment has important applications in homologous ncRNA consensus structure derivation, novel ncRNA identification, and known ncRNA classification. As many ncRNAs’ functions are determined by both their sequences and secondary structures, accurate ncRNA alignment algorithms must maximize both sequence and structural similarity simultaneously, incurring high computational cost. F...

متن کامل

Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment

We investigate multiple many-to-many alignments as a primary step in integrating supplemental information strings in string transduction. Besides outlining DP based solutions to the multiple alignment problem, we detail an approximation of the problem in terms of multiple sequence segmentations satisfying a coupling constraint. We apply our approach to boosting baseline G2P systems using homoge...

متن کامل

Sequence Alignment as a Database Technology Challenge

Sequence alignment is an important task for molecular biologists. Because alignment basically deals with approximate string matching on large biological sequence collections, it is both data intensive and computationally complex. There exist several tools for the variety of problems related to sequence alignment. Our first observation is that the term ’sequence database’ is used in general for ...

متن کامل

Multiple Sequence Alignments in Linguistics

In this study we apply and evaluate an iterative pairwise alignment program for producing multiple sequence alignments, ALPHAMALIG (Alonso et al., 2004), using as material the phonetic transcriptions of words used in Bulgarian dialectological research. To evaluate the quality of the multiple alignment, we propose two new methods based on comparing each column in the obtained alignments with the...

متن کامل

NP - and MAX SNP - hardness ofMultiple Sequence Tree

This communication gives a short proof of the NP-and MAX-SNP hardness of the multiple sequence tree alignment problem. As such, it gives the rst available proof of the NP-hardness result claimed in (Warnow 1993) and greatly simpliies the MAX SNP-hardness proof given in (Wang and Jiang 1994). Though there are many types of multiple sequence alignment (see (Chan et al. 1994) and references), tree...

متن کامل

1 Introduction to Sequence Alignment

The natural occurrence of strings in our daily lives motivates several applications where discovering the similarities have great significance and utility. The definition of a string is different based on the field of study and context of the experiment we are conducting. There are many prevalent problems that can be solved by discovering underlying structure in strings, both continuous and dis...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Multiple String Alignment

نویسندگان

چکیده

منابع مشابه

Grammar string: a novel ncRNA secondary structure representation

Multiple Many-to-Many Sequence Alignment for Combining String-Valued Variables: A G2P Experiment

Sequence Alignment as a Database Technology Challenge

Multiple Sequence Alignments in Linguistics

NP - and MAX SNP - hardness ofMultiple Sequence Tree

1 Introduction to Sequence Alignment

عنوان ژورنال:

اشتراک گذاری