Probabilistic transcriptome assembly and variant graph genotyping
نویسندگان
چکیده
The introduction of second-generation sequencing, has in recent years allowed the biological community to determine the genomes and transcriptomes of organisms and individuals at an unprecedented rate. However, almost every step in the sequencing protocol introduces uncertainties in how the resulting sequencing data should be interpreted. This has over the years spurred the development of many probabilistic methods that are capable of modelling different aspects of the sequencing process. Here, I present two of such methods that were developed to each tackle a different problem in bioinformatics, together with an application of the latter method to a large Danish sequencing project. The first is a probabilistic method for transcriptome assembly that is based on a novel generative model of the RNA sequencing process and provides confidence estimates on the assembled transcripts. We show that this approach outperforms existing state-of-the-art methods measured using sensitivity and precision on both simulated and real data. The second is a novel probabilistic method that uses exact alignment of k-mers to a set of variants graphs to provide unbiased estimates of genotypes in a population of individuals. Using simulations we show that this method markedly increases sensitivity without sacrificing precision, when compared to mapping-based approaches, especially in variant dense regions. We further demonstrate, using high coverage real genome sequencing data of parent-offspring trios, that our method is accurate even for larger structural variants measured using trio concordance. Finally, we applied the second method to genotype variants, predicted using both a mappingbased approach and de novo assemblies, in a population of 50 Danish parent-offspring trios in the GenomeDenmark project. Using this hybrid-approach we not only created a variant set that was more complete, in term of structural variants, compared to previous similar studies but also significantly reduced the bias towards deletions normally observed in such studies.
منابع مشابه
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملExploring single-sample SNP and INDEL calling with whole-genome de novo assembly
MOTIVATION Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (...
متن کاملIDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels
MOTIVATION RNA sequencing based on next-generation sequencing technology is effective for analyzing transcriptomes. Like de novo genome assembly, de novo transcriptome assembly does not rely on any reference genome or additional annotation information, but is more difficult. In particular, isoforms can have very uneven expression levels (e.g. 1:100), which make it very difficult to identify low...
متن کاملImproved conversion rates for SNP genotyping of nonmodel organisms
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation and are highly adaptable to large-scale automated genotyping and population genetics studies. For nonmodel organisms, many SNP discovery projects are based on sequencing and assembly of a transcriptome and the calling of sequence variation in contigs. This paper develops a new method for avoiding intron/exon ...
متن کاملSequencing and de novo transcriptome assembly of Brachypodium sylvaticum (Poaceae)1
UNLABELLED PREMISE OF THE STUDY We report the de novo assembly and characterization of the transcriptomes of Brachypodium sylvaticum (slender false-brome) accessions from native populations of Spain and Greece, and an invasive population west of Corvallis, Oregon, USA. • METHODS AND RESULTS More than 350 million sequence reads from the mRNA libraries prepared from three B. sylvaticum genot...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016