Integrating GPU-Accelerated Sequence Alignment and SNP Detection for Genome Resequencing Analysis

نویسندگان

  • Mian Lu
  • Yuwei Tan
  • Jiuxin Zhao
  • Ge Bai
  • Qiong Luo
چکیده

DNA sequence alignment and single-nucleotide polymorphism (SNP) detection are two important tasks in genomics research. A common genome resequencing analysis workflow is to first perform sequence alignment and then detect SNPs among the aligned sequences. In practice, the performance bottleneck in this workflow is usually the intermediate result I/O due to the separation of the two components, especially when the in-memory computation has been accelerated, e.g., by graphics processors. To address this bottleneck, we propose to integrate the two tasks tightly so as to eliminate the I/O of intermediate results in the workflow. Specifically, we make the following three changes for the tight integration: (1) we adopt a partition-based approach so that the external sorting of alignment results, which was required for SNP detection, is eliminated; (2) we perform customized compression on alignment results to reduce memory footprint; and (3) we move the computation of a global matrix from SNP detection to sequence alignment to save a file scan. We have developed a GPU-accelerated system that tightly integrates sequence alignment and SNP detection. Our results with human genome data sets show that our GPU-acceleration of individual components in the traditional workflow improves the overall performance by 18 times and that the tight integration further improves the performance of the GPU-accelerated system by 2.3 times.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNP detection for massively parallel whole-genome resequencing.

Next-generation massively parallel sequencing technologies provide ultrahigh throughput at two orders of magnitude lower unit cost than capillary Sanger sequencing technology. One of the key applications of next-generation sequencing is studying genetic variation between individuals using whole-genome or target region resequencing. Here, we have developed a consensus-calling and SNP-detection m...

متن کامل

Using False Discovery Rates to Benchmark SNP-callers in next-generation sequencing projects

Sequence alignments form the basis for many comparative and population genomic studies. Alignment tools provide a range of accuracies dependent on the divergence between the sequences and the alignment methods. Despite widespread use, there is no standard method for assessing the accuracy of a dataset and alignment strategy after resequencing. We present a framework and tool for determining the...

متن کامل

EagleView: a genome assembly viewer for next-generation sequencing technologies.

The emergence of high-throughput next-generation sequencing technologies (e.g., 454 Life Sciences [Roche], Illumina sequencing [formerly Solexa sequencing]) has dramatically sped up whole-genome de novo sequencing and resequencing. While the low cost of these sequencing technologies provides an unparalleled opportunity for genome-wide polymorphism discovery, the analysis of the new data types a...

متن کامل

Single Nucleotide Polymorphisms and Association Studies: A Few Critical Points

Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...

متن کامل

BAC-End Sequence-Based SNP Mining in Allotetraploid Cotton (Gossypium) Utilizing Resequencing Data, Phylogenetic Inferences, and Perspectives for Genetic Mapping

A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012