Storage and Retrieval of Highly Repetitive Sequence Collections
نویسندگان
چکیده
منابع مشابه
Storage and Retrieval of Highly Repetitive Sequence Collections
A repetitive sequence collection is a set of sequences which are small variations of each other. A prominent example are genome sequences of individuals of the same or close species, where the differences can be expressed by short lists of basic edit operations. Flexible and efficient data analysis on such a typically huge collection is plausible using suffix trees. However, the suffix tree occ...
متن کاملStorage and Retrieval of Individual Genomes and other Repetitive Sequence Collections
In the near future, biomolecular engineering techniques will reach a state where the sequencing of individual genomes becomes feasible. This progress will create huge expectations for the data analysis domain to reveal new knowledge on the ”secrets of life”. Quite rudimentary reasons may inhibit such breakthroughs; it may not be feasible to store all the data in a form that would enable anythin...
متن کاملIndexing Highly Repetitive Collections
The need to index and search huge highly repetitive sequence collections is rapidly arising in various fields, including computational biology, software repositories, versioned collections, and others. In this short survey we briefly describe the progress made along three research lines to address the problem: compressed suffix arrays, grammar compressed indexes, and Lempel-Ziv compressed indexes.
متن کاملDocument Retrieval on Repetitive Collections
Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings. Traditional pattern-matching techniques yield brute-force document retrieval solutions, which has motivated the research on tailored indexes that offer near-optimal performance. However, an experimental study establishing which alternatives are actually better than brute force, an...
متن کاملUniversal Indexes for Highly Repetitive Document Collections
Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that are near-copies of others. Traditional techniques for indexing these collections fail to properly exploit their regularities in order to reduce space. We int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computational Biology
سال: 2010
ISSN: 1066-5277,1557-8666
DOI: 10.1089/cmb.2009.0169