tHapMix: simulating tumour samples through haplotype mixtures

نویسندگان

  • Sergii Ivakhno
  • Camilla Colombo
  • Stephen Tanner
  • Philip Tedder
  • Stefano Berri
  • Anthony J. Cox
چکیده

MOTIVATION Large-scale rearrangements and copy number changes combined with different modes of clonal evolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable variant calling tools and create well-calibrated benchmarks. RESULTS We developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools. AVAILABILITY AND IMPLEMENTATION tHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/tHapMix CONTACT: [email protected] information: Supplementary data are available at Bioinformatics online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model

Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...

متن کامل

MixSim: An R Package for Simulating Data to Study Performance of Clustering Algorithms

The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datase...

متن کامل

Simulating haplotype blocks in the human genome

SUMMARY A bioinformatic tool was written to simulate haplotypes and SNPs under a modified coalescent with recombination. The most important feature of this program is that it allows for the specification of non-homogeneous recombination rates, which results in the formation of the so-called 'haplotype blocks' of the human genome. The program also implements different mutation models and flexibl...

متن کامل

Spectrophotometric Multicomponent Analysis of Ternary and Quaternary Drug Mixtures in Human Urine Samples by Analyzing First-order Data

      A new method was developed for the spectral resolution by further determination of three- and  four-component  mixtures  of  drugs in urine samples through  the complementary application of multivariate curve resolution-alternating least squares with correlation  constraint. In the current study, a simple method was proposed to construct a calibration set for the mixture of drugs in the p...

متن کامل

FTEC: a coalescent simulator for modeling faster than exponential growth

SUMMARY Recent genetic studies as well as recorded history point to massive growth in human population sizes during the recent past. To model and understand this growth accurately we introduce FTEC, an easy-to-use coalescent simulation program capable of simulating haplotype samples drawn from a population that has undergone faster than exponential growth. Samples drawn from a population that h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 33 2  شماره 

صفحات  -

تاریخ انتشار 2017