نتایج جستجو برای: text length

تعداد نتایج: 467834  

2003
Kimmo Fredriksson Jorma Tarhio

We present an efficient algorithm for scanning Huffman compressed texts. The algorithm parses the compressed text in O(n log2 σ b ) time, where n is the size of the compressed text in bytes, σ is the size of the alphabet, and b is a user specified parameter. The method uses a variable size super-alphabet, with an average size of O( b H log2 σ ) symbols, where H is the entropy of the text. Each ...

2017
Travis Gagie Gonzalo Navarro Nicola Prezza

Shannon’s entropy is a clear lower bound for statistical compression. The situation is not so well understood for dictionary-based compression. A plausible lower bound is b, the least number of phrases of a general bidirectional parse of a text, where phrases can be copied from anywhere else in the text. Since computing b is NP-complete, a popular gold standard is z, the number of phrases in th...

1998
Kunihiko Sadakane

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...

1998
Kunihiko Sadakane

We propose a fast and memory e cient algorithm for sorting su xes of a text in lexicographic order. It is important to sort su xes because an array of indexes of su xes is called su x array and it is a memory e cient alternative of the su x tree. Sorting su xes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desire...

2013
Juan Manuel Cabrera Hugo Jair Escalante Manuel Montes-y-Gómez

Everyday, millions of short-texts are generated for which effective tools for organization and retrieval are required. Because of the tiny length of these documents and of their extremely sparse representations, the direct application of standard text categorization methods is not effective. In this work we propose using distributional term representations (DTRs) for short-text categorization. ...

2000
Huma Lodhi John Shawe-Taylor Nello Cristianini Chris Watkins

We introduce a novel kernel for comparing two text documents. The kernel is an inner product in the feature space consisting of all subsequences of length k. A subsequence is any ordered sequence of k characters occurring in the text though not necessarily contiguously. The subsequences are weighted by an exponentially decaying factor of their full length in the text, hence emphasising those oc...

2015
Faris Kateb Jugal Kalita Fabian Abel Qi Gao Geert-Jan Houben Grigoris Antoniou Marko Grobelnik Elena Simperl Bijan Parsia Dimitris Plexousakis Pieter Leenheer Jeff Pan Brian Babcock Shivnath Babu Mayur Datar Rajeev Motwani Jennifer Widom Adam Bermingham Johan Bollen Huina Mao Meeyoung Cha Hamed Haddadi Fabricio Benevenuto

With the huge growth of social media, especially with 500 million Twitter messages being posted per day, analyzing these messages has caught intense interest of researchers. Topics of interest include micro-blog summarization, breaking news detection, opinion mining and discovering trending topics. In information extraction, researchers face challenges in applying data mining techniques due to ...

1993
Alden H. Wright Yi Jiang

Given a text string T of length n, a shorter pattern string A of length m, and an integer k, an simple straightforward O(k) parallel algorithm for nding all occurrences of the pattern string in the text string with at most k di erences (as de ned by edit distance) is presented. The algorithm uses the priority CRCW-PRAM model of computation and (n m+ k + 2) m = O(n m) processors. Over recent dec...

2007
Mohammad Sohel Rahman Costas S. Iliopoulos

In this paper, we present algorithms for pattern matching, where either the pattern P or the text T can contain “don’t care” characters. If the pattern P contains don’t care characters, then we can solve the pattern matching problem in O(n +m + α) time, where α is the total number of occurrences of the component subpatterns. We also can handle online queries, given an O(n) preprocessing time, r...

2007
Yangjun Chen

The subset matching problem is to find all occurrences of a pattern string p of length m in a text string t of length n, where each pattern and text position is a set of characters drawn from some alphabet Σ. The pattern is said to occur at text position i if the set p[j] is a subset of the set t[i + j 1], for all j (1 ≤ j ≤ m). This is a generalization of the ordinary string matching and can b...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید