CoMSA: compression of protein multiple sequence alignment files
نویسندگان
چکیده
منابع مشابه
Protein multiple sequence alignment.
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of task...
متن کاملMultiple protein sequence alignment.
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequen...
متن کاملPractical compression for multi-alignment genomic files
Genomic sequence data is being generated in massive quantities, and must be stored in compressed form. Here we examine the combined challenge of storing such data compactly, yet providing bioinformatics researchers with the ability to extract particular regions of interest without needing to fully decompress multi-gigabyte data collections. We focus on data produced in SAM format, which is part...
متن کاملMultiple Sequence Alignment Multiple Sequence Alignment
An algorithm for progressive multiple alignment of sequences with insertions " , 1. Introduction The problem of sequence alignment is to find the patterns of sequence conservation and similarity between pairs or sets of given sequences. In biological contexts, similarity between biological sequences usually amounts to either functional or structural similarities or divergence from a common ance...
متن کاملMultiple sequence alignment.
Multiple sequence alignments are an essential tool for protein structure and function prediction, phylogeny inference and other common tasks in sequence analysis. Recently developed systems have advanced the state of the art with respect to accuracy, ability to scale to thousands of proteins and flexibility in comparing proteins that do not share the same domain architecture. New multiple align...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2018
ISSN: 1367-4803,1460-2059
DOI: 10.1093/bioinformatics/bty619