tHapMix: simulating tumour samples through haplotype mixtures
نویسندگان
چکیده
MOTIVATION Large-scale rearrangements and copy number changes combined with different modes of clonal evolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable variant calling tools and create well-calibrated benchmarks. RESULTS We developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools. AVAILABILITY AND IMPLEMENTATION tHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/tHapMix CONTACT: [email protected] information: Supplementary data are available at Bioinformatics online.
منابع مشابه
Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...
متن کاملMixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms
The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datase...
متن کاملSimulating haplotype blocks in the human genome
SUMMARY A bioinformatic tool was written to simulate haplotypes and SNPs under a modified coalescent with recombination. The most important feature of this program is that it allows for the specification of non-homogeneous recombination rates, which results in the formation of the so-called 'haplotype blocks' of the human genome. The program also implements different mutation models and flexibl...
متن کاملSpectrophotometric Multicomponent Analysis of Ternary and Quaternary Drug Mixtures in Human Urine Samples by Analyzing First-order Data
A new method was developed for the spectral resolution by further determination of three- and four-component mixtures of drugs in urine samples through the complementary application of multivariate curve resolution-alternating least squares with correlation constraint. In the current study, a simple method was proposed to construct a calibration set for the mixture of drugs in the p...
متن کاملFTEC: a coalescent simulator for modeling faster than exponential growth
SUMMARY Recent genetic studies as well as recorded history point to massive growth in human population sizes during the recent past. To model and understand this growth accurately we introduce FTEC, an easy-to-use coalescent simulation program capable of simulating haplotype samples drawn from a population that has undergone faster than exponential growth. Samples drawn from a population that h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 33 2 شماره
صفحات -
تاریخ انتشار 2017