The gapped-factor tree
نویسندگان
چکیده
We present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in O(n × |Σ|) time and space, with n the length of the text and |Σ| the size of the alphabet. Such a data structure may play an important role in some pattern matching and motif inference problems, for instance in text filtration.
منابع مشابه
Indexing Gapped-Factors Using a Tree
We present a data structure to index a specific kind of factors, that is of substrings, called gapped-factors. A gapped-factor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gapped-factors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in ...
متن کاملStructural Analysis of Gapped Motifs of a String
We investigate the structure of the set of gapped motifs (repeated patterns with don’t cares) of a given string of symbols. A natural equivalence classification is introduced for the motifs, based on their pattern of occurrences, and another classification for the occurrence patterns, based on the induced motifs. Quadratic–time algorithms are given for finding a maximal representative for an eq...
متن کاملSearching of Gapped Repeats and Subrepetitions in a Word
A gapped repeat is a factor of the form uvu where u and v are nonempty words. The period of the gapped repeat is defined as |u|+ |v|. The gapped repeat is maximal if it cannot be extended to the left or to the right by at least one letter with preserving its period. The gapped repeat is called α-gapped if its period is not greater than α|v|. A δsubrepetition is a factor which exponent is less t...
متن کاملA DNA based Approach to find Closed Repetitive Gapped Subsequences from a Sequence Database
In bioinformatics, the discovery of transcription factor binding affinities is important. This is done by sequence analysis of micro array data. The determination of continuous and gapped motifs accurately from the given long sequence of data, say genetic data is challenging and requires a detailed study. In this paper, we propose an algorithm that can be used for finding short continuous, shor...
متن کاملEfficiently Finding All Maximal alpha-gapped Repeats
For α ≥ 1, an α-gapped repeat in a word w is a factor uvu of w such that |uv| ≤ α|u|; the two factors u in such a repeat are called arms, while the factor v is called gap. Such a repeat is called maximal if its arms cannot be extended simultaneously with the same symbol to the right or, respectively, to the left. In this paper we show that the number of maximal α-gapped repeats that may occur i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006