The variant call format and VCFtools

نویسندگان

  • Petr Danecek
  • Adam Auton
  • Gonçalo R. Abecasis
  • Cornelis A. Albers
  • Eric Banks
  • Mark A. DePristo
  • Robert E. Handsaker
  • Gerton Lunter
  • Gabor T. Marth
  • Stephen T. Sherry
  • Gilean McVean
  • Richard Durbin
چکیده

SUMMARY The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY http://vcftools.sourceforge.net

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SeqArray - a storage-efficient high-performance data format for WGS variant calls

Motivation Whole-genome sequencing (WGS) data are being generated at an unprecedented rate. Analysis of WGS data requires a flexible data format to store the different types of DNA variation. Variant call format (VCF) is a general text-based format developed to store variant genotypes and their annotations. However, VCF files are large and data retrieval is relatively slow. Here we introduce a ...

متن کامل

Multicore and Cloud-Based Solutions for Genomic Variant Analysis

Genomic variant analysis is a complex process that allows to find and study genome mutations. For this purpose, analysis and tests from both biological and statistical points of view must be conducted. Biological data for this kind of analysis are typically stored according to the Variant Call Format (VCF), in gigabytes-sized files that cannot be efficiently processed using conventional softwar...

متن کامل

Effects of CALL-Mediated TBLT on Self-Efficacy for Reading among Iranian University Non-English Major EFL Students

The rich and still expanding literature on TBLT is helping to mature both its theoretical conceptualization and practical implementation in foreign and second language education. Similarly, computer-assisted language learning (CALL) has grown as a field, with the use and integration of technology in the classroom continuing to increase and will continue to play an important role in this maturat...

متن کامل

Effects of CALL-Mediated TBLT on Self-Efficacy for Reading among Iranian University Non-English Major EFL Students

The rich and still expanding literature on TBLT is helping to mature both its theoretical conceptualization and practical implementation in foreign and second language education. Similarly, computer-assisted language learning (CALL) has grown as a field, with the use and integration of technology in the classroom continuing to increase and will continue to play an important role in this maturat...

متن کامل

cyvcf2: fast, flexible variant analysis with Python

Motivation Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. Results We introdu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2011