A bisection algorithm for grammar-based compression of ordered trees
نویسنده
چکیده
منابع مشابه
An Effective Grammar-Based Compression Algorithm for Tree Structured Data
Many semistructured data such as HTML/XML files are represented by rooted trees t such that all children of each internal vertex of t are ordered and all edges of t have labels. Such data is called tree structured data. Analyzing large tree structured data is a time-consuming process in data mining. If we can reduce the size of input data without loss of information, we can speed up such a heav...
متن کاملThe Smallest Grammar Problem Revisited
In a seminal paper of Charikar et al. on the smallest grammar problem, the authors derive upper and lower bounds on the approximation ratios for several grammar-based compressors, but in all cases there is a gap between the lower and upper bound. Here we close the gaps for LZ78 and BISECTION by showing that the approximation ratio of LZ78 is Θ((n/ logn)), whereas the approximation ratio of BISE...
متن کاملDictionary-Based Tree Compression
Trees are a ubiquitous data structure in computer science. LISP, for instance, was designed to manipulate nested lists, that is, ordered unranked trees. Already at that time, DAGs were used to detect common subexpression, a process known as “hash consing.” In a DAG every distinct subtree is represented only once (but can be referenced many times) and hence it constitutes a dictionary-based comp...
متن کاملGrammar-Based Tree Compression
This paper gives a survey on recent progress in grammarbased compression for trees. Also algorithms that directly work on grammar-compressed trees will be surveyed.
متن کاملGrammar-based Compression of Unranked Trees
We introduce forest straight-line programs (FSLPs) as a compressed representation of unranked ordered node-labelled trees. FSLPs are based on the operations of forest algebra and generalize tree straight-line programs. We compare the succinctness of FSLPs with two other compression schemes for unranked trees: top dags and tree straight-line programs of first-child/next sibling encodings. Effici...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Process. Lett.
دوره 110 شماره
صفحات -
تاریخ انتشار 2010