Manipulation of FASTQ data with Galaxy
نویسندگان
چکیده
SUMMARY Here, we describe a tool suite that functions on all of the commonly known FASTQ format variants and provides a pipeline for manipulating next generation sequencing data taken from a sequencing machine all the way through the quality filtering steps. AVAILABILITY AND IMPLEMENTATION This open-source toolset was implemented in Python and has been integrated into the online data analysis platform Galaxy (public web access: http://usegalaxy.org; download: http://getgalaxy.org). Two short movies that highlight the functionality of tools described in this manuscript as well as results from testing components of this tool suite against a set of previously published files are available at http://usegalaxy.org/u/dan/p/fastq
منابع مشابه
fqtools: an efficient software suite for modern FASTQ file manipulation
UNLABELLED Many Next Generation Sequencing analyses involve the basic manipulation of input sequence data before downstream processing (e.g. searching for specific sequences, format conversion or basic file statistics). The rapidly increasing data volumes involved in NGS make any dataset manipulation a time-consuming and error-prone process. I have developed fqtools; a fast and reliable FASTQ f...
متن کاملBioclojure: a functional library for the manipulation of biological sequences
MOTIVATION BioClojure is an open-source library for the manipulation of biological sequence data written in the language Clojure. BioClojure aims to provide a functional framework for the processing of biological sequence data that provides simple mechanisms for concurrency and lazy evaluation of large datasets. RESULTS BioClojure provides parsers and accessors for a range of biological seque...
متن کاملاندازهگیری نمایه عمق نوری خوشههای کهکشانی با استفاده از اثرسونیائف زلدوویچ جنبشی
baryonic matter distribution in the large-scale structures is one of the main questions in cosmology. This distribution can provide valuable information regarding the processes of galaxy formation and evolution. On the other hand, the missing baryon problem is still under debate. One of the most important cosmological structures for studying the rate and the distribution of the baryons is gal...
متن کاملBEETL-fastq: a searchable compressed archive for DNA reads
MOTIVATION FASTQ is a standard file format for DNA sequencing data, which stores both nucleotides and quality scores. A typical sequencing study can easily generate hundreds of gigabytes of FASTQ files, while public archives such as ENA and NCBI and large international collaborations such as the Cancer Genome Atlas can accumulate many terabytes of data in this format. Compression tools such as ...
متن کاملLFQC: a lossless compression algorithm for FASTQ files
MOTIVATION Next Generation Sequencing (NGS) technologies have revolutionized genomic research by reducing the cost of whole genome sequencing. One of the biggest challenges posed by modern sequencing technology is economic storage of NGS data. Storing raw data is infeasible because of its enormous size and high redundancy. In this article, we address the problem of storage and transmission of l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 26 شماره
صفحات -
تاریخ انتشار 2010