ConDeTri - A Content Dependent Read Trimmer for Illumina Data
نویسندگان
چکیده
UNLABELLED During the last few years, DNA and RNA sequencing have started to play an increasingly important role in biological and medical applications, especially due to the greater amount of sequencing data yielded from the new sequencing machines and the enormous decrease in sequencing costs. Particularly, Illumina/Solexa sequencing has had an increasing impact on gathering data from model and non-model organisms. However, accurate and easy to use tools for quality filtering have not yet been established. We present ConDeTri, a method for content dependent read trimming for next generation sequencing data using quality scores of each individual base. The main focus of the method is to remove sequencing errors from reads so that sequencing reads can be standardized. Another aspect of the method is to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequence data of arbitrary length and it is independent from sequencing coverage and user interaction. ConDeTri is able to trim and remove reads with low quality scores to save computational time and memory usage during de novo assemblies. Low coverage or large genome sequencing projects will especially gain from trimming reads. The method can easily be incorporated into preprocessing and analysis pipelines for Illumina data. AVAILABILITY AND IMPLEMENTATION Freely available on the web at http://code.google.com/p/condetri.
منابع مشابه
Trimmomatic: a flexible trimmer for Illumina sequence data
MOTIVATION Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data. RESULT...
متن کاملDirect Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample
Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DN...
متن کاملSentence Compression as a Component of a Multi-Document Summarization System
We applied a single-document sentencetrimming approach (Trimmer) to the problem of multi-document summarization. Trimmer was designed with the intention of compressing a lead sentence into a space consisting of tens of characters. In our Multi-Document Trimmer (MDT), we use Trimmer to generate multiple trimmed candidates for each sentence. Sentence selection is used to determine which trimmed c...
متن کاملAssessment of insert sizes and adapter content in fastq data from NexteraXT libraries
The Illumina NexteraXT transposon protocol is a cost effective way to generate paired end libraries. However, the resulting insert size is highly sensitive to the concentration of DNA used, and the variation of insert sizes is often large. One consequence of this is some fragments may have an insert shorter than the length of a single read, particularly where the library is designed to produce ...
متن کاملCorrection: Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data
The first sentence of the second paragraph of the ‘‘Assemblies’’ subsection of the Methods should have cited reference 34 instead of 33. The correct sentence should read: The Illumina datasets were assembled using SOAPdenovo 1.05 [34] using following parameters: ‘‘-K 23 -L 70 -M 3 -u -R -F’’. The fourth sentence of the first paragraph of the ‘‘Importance of Quality Control for Illumina Data’’ s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 6 شماره
صفحات -
تاریخ انتشار 2011