نتایج جستجو برای: text length
تعداد نتایج: 467834 فیلتر نتایج به سال:
We present an efficient algorithm for scanning Huffman compressed texts. The algorithm parses the compressed text in O(n log2 σ b ) time, where n is the size of the compressed text in bytes, σ is the size of the alphabet, and b is a user specified parameter. The method uses a variable size super-alphabet, with an average size of O( b H log2 σ ) symbols, where H is the entropy of the text. Each ...
Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is b, the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing b is NP-complete, a popular gold standard is z, the number of phrases in th...
We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...
We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...
Everyday, millions of short-texts are generated for which effective tools for organization and retrieval are required. Because of the tiny length of these documents and of their extremely sparse representations, the direct application of standard text categorization methods is not effective. In this work we propose using distributional term representations (DTRs) for short-text categorization. ...
We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those oc...
With the huge growth of social media, especially with 500 million Twitter messages being posted per day, analyzing these messages has caught intense interest of researchers. Topics of interest include micro-blog summarization, breaking news detection, opinion mining and discovering trending topics. In information extraction, researchers face challenges in applying data mining techniques due to ...
Given a text string T of length n, a shorter pattern string A of length m, and an integer k, an simple straightforward O(k) parallel algorithm for nding all occurrences of the pattern string in the text string with at most k di erences (as de ned by edit distance) is presented. The algorithm uses the priority CRCW-PRAM model of computation and (n m+ k + 2) m = O(n m) processors. Over recent dec...
In this paper, we present algorithms for pattern matching, where either the pattern P or the text T can contain “don’t care” characters. If the pattern P contains don’t care characters, then we can solve the pattern matching problem in O(n +m + α) time, where α is the total number of occurrences of the component subpatterns. We also can handle online queries, given an O(n) preprocessing time, r...
The subset matching problem is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text position is a set of characters drawn from some alphabet Σ. The pattern is said to occur at text position i if the set p[j] is a subset of the set t[i + j 1], for all j (1 ≤ j ≤ m). This is a generalization of the ordinary string matching and can b...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید