Persistent Su x Trees and Su x BinarySearch Trees as DNA Sequence

نویسندگان

  • Ela Hunt
  • Robert W. Irving
  • Malcolm Atkinson
چکیده

We constructed, stored on disk and reused suux trees and suux binary search trees for C. elegans chromosomes, and measured their performance using orthogonal persistence for Java (PJama). We compare our implementation with the performance of a transient 1 suux tree, and discuss the suitability of such indexes in pursuing our long-term goal of indexing large genomes. We identify the potential for persistent DNA indexes in a variety of biological and medical contexts, and believe they will complement current string searching methods based on transient data structures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ACCEPTED FOR PhDOO WORKSHOP, ECOOP'00 PJama Stores and Su x Tree Indexing for Bioinformatics Applications

Motivation: The biggest public domain biological sequence archive exceeds 6Gbases of DNA and much larger sequence amounts are held by industrial labs. The amount of data is growing exponentially but sequence search technologies still rely on at le storage and high-throughput parallel computers reading all data sequentially to nd sequence similarities or patterns. This issue is not addressed by ...

متن کامل

Optimal Su x Tree Construction with Large

The su x tree of a string is the fundamental data structure of combinatorial pattern matching. In this paper, we present a novel, deterministic algorithm for the construction of su x trees. We settle the main open problem in the construction of su x trees: we build su x trees in linear time for integer alphabet.

متن کامل

Augmenting Su x Trees, with Applications

Information retrieval and data compression are the two main application areas where the rich theory of string algorithmics plays a fundamental role In this paper we consider one algorithmic problem from each of these areas and present highly e cient linear or near linear time algorithms for both problems Our algorithms rely on augmenting the su x tree a fundamental data structure in string algo...

متن کامل

Generalizations of suffix arrays to multi-dimensional matrices

We propose multi-dimensional index data structures that generalize su x arrays to square matrices and cubic matrices. Giancarlo proposed a two-dimensional index data structure, the Lsu x tree, that generalizes su x trees to square matrices. However, the construction algorithm for Lsu x trees maintains complicated data structures and uses a large amount of space. We present simple and practical ...

متن کامل

The enhanced su x array and its applications to genome analysis

In large scale applications as computational genome analysis, the space requirement of the su x tree is a severe drawback. In this paper, we present a uniform framework that enables us to systematically replace every string processing algorithm that is based on a bottomup traversal of a su x tree by a corresponding algorithm based on an enhanced su x array (a su x array enhanced with the lcp-ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000