External Memory Best-First Search for Multiple Sequence Alignment

نویسندگان

  • Matthew Hatem
  • Wheeler Ruml
چکیده

Multiple sequence alignment (MSA) is a central problem in computational biology. It is well known that MSA can be formulated as a shortest path problem and solved using heuristic search, but the memory requirement of A* makes it impractical for all but the smallest problems. Partial Expansion A* (PEA*) reduces the memory requirement of A* by generating only the most promising successor nodes. However, even PEA* exhausts available memory on many problems. Another alternative is Iterative Deepening Dynamic Programming, which uses an uninformed search order but stores only the nodes along the search frontier. However, it too cannot scale to the largest problems. In this paper, we propose storing nodes on cheap and plentiful secondary storage. We present a new general-purpose algorithm, Parallel External PEA* (PE2A*), that combines PEA* with Delayed Duplicate Detection to take advantage of external memory and multiple processors to solve large MSA problems. In our experiments, PE2A* is the first algorithm capable of solving the entire Reference Set 1 of the standard BAliBASE benchmark using a biologically accurate cost function. This work suggests that external best-first search can effectively use heuristic information to surpass methods that rely on uninformed search orders.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Heuristic Search with Limited Memory By

HEURISTIC SEARCH WITH LIMITED MEMORY by Matthew Hatem University of New Hampshire, May, 2014 Heuristic search algorithms are commonly used for solving problems in artificial intelligence. Unfortunately, the memory requirement of A*, the most widely used heuristic search algorithm, is often proportional to its running time, making it impractical for large problems. Several techniques exist for s...

متن کامل

Heuristic Search for Large Problems With Real Costs

Heuristic search is a fundamental technique for solving problems in artificial intelligence. However, many heuristic search algorithms, such as A* are limited by the amount of main memory available. External memory search overcomes the memory limitation of A* by taking advantage of cheap secondary storage, such as disk. Previous work in this area assumes that edge costs fall within a narrow ran...

متن کامل

Comparing Best-First Search and Dynamic Programming for Optimal Multiple Sequence Alignment

Sequence alignment is an important problem in computational biology. We compare two different approaches to the problem of optimally aligning two or more character strings: bounded dynamic programming (BDP), and divide-and-conquer frontier search (DCFS). The approaches are compared in terms of time and space requirements in 2 through 5 dimensions with sequences of varying similarity and length....

متن کامل

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features

In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013