Analysis of protein - coding genetic variation in 60 , 706 humans 1 Exome Aggregation Consortium

نویسنده

  • James Y. Zou
چکیده

50. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not. 51. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not. 2 3 4 * These authors contributed equally to this work and names appear in alphabetical order 5 † Corresponding author 6 # List of collaborators to appear in Supplementary 7. CC-BY-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not. 4 Summary 1 Large-scale reference data sets of human genetic variation are critical for the medical 2 and functional interpretation of DNA sequence changes. Here we describe the 3 aggregation and analysis of high-quality exome (protein-coding region) sequence data 4 for 60,706 individuals of diverse ethnicities. The resulting catalogue of human genetic 5 diversity has unprecedented resolution, with an average of one variant every eight bases 6 of coding sequence and the presence of widespread mutational recurrence. The deep 7 catalogue of variation provided by the Exome Aggregation Consortium (ExAC) can be 8 used to calculate objective metrics of pathogenicity for sequence variants, and to identify 9 genes subject to strong selection against various classes of mutation; we identify 3,230 10 genes with near-complete depletion of truncating variants, 79% of which have no 11 currently established human disease phenotype. Finally, we show that these data can be 12 used for the efficient filtering of candidate disease-causing variants, and for the 13 discovery of human " knockout " variants in protein-coding genes. 14 15 Background 16 Over the last five years, the widespread availability of high-throughput DNA sequencing 17 technologies has permitted the sequencing of the whole genomes or exomes (the 18 protein-coding regions of genomes) of over half a million humans. In theory, these data 19 represent a powerful source of information about the global patterns of human genetic 20 variation, but in practice, are difficult to access for practical, logistical, and ethical 21 reasons; in addition, the inconsistent processing complicates variant-calling pipelines 22 used by different groups. Current publicly available datasets of human DNA sequence 23 variation contain only a small fraction of all sequenced samples: the Exome Variant 24 Server, created as part of the NHLBI Exome Sequencing Project (ESP) 1 , contains 25 frequency information …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The ExAC browser: displaying reference data information from over 60 000 exomes

Worldwide, hundreds of thousands of humans have had their genomes or exomes sequenced, and access to the resulting data sets can provide valuable information for variant interpretation and understanding gene function. Here, we present a lightweight, flexible browser framework to display large population datasets of genetic variation. We demonstrate its use for exome sequence data from 60 706 in...

متن کامل

Association of a rare NOTCH4 coding variant with systemic sclerosis: a family-based whole exome sequencing study

BACKGROUND Systemic sclerosis (SSc) is a rheumatologic disease with a multifactorial etiology. Genome-wide association studies imply a polygenic, complex mode of inheritance with contributions from variation at the human leukocyte antigen locus and non-coding variation at a locus on chromosome 6p21, among other modestly impactful loci. Here we describe an 8-year-old female proband presenting wi...

متن کامل

Editorial: The Post-Exome Era

The Iranian Rehabilitation Journal (IRJ) invites research papers on the genetic basis of single gene and complex disorders. This vastly dynamic branch of science will complement the multidisciplinary wealth of expertise in the fields of social welfare and rehabilitation. The past few years have witnessed outstanding research projects on the genetic causes of numerous debilitating disorders, suc...

متن کامل

Defining the genetic architecture of hypertrophic cardiomyopathy: re-evaluating the role of non-sarcomeric genes

Aim Hypertrophic cardiomyopathy (HCM) exhibits genetic heterogeneity that is dominated by variation in eight sarcomeric genes. Genetic variation in a large number of non-sarcomeric genes has also been implicated in HCM but not formally assessed. Here we used very large case and control cohorts to determine the extent to which variation in non-sarcomeric genes contributes to HCM. Methods and r...

متن کامل

Quantifying the unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects

James Zou, Gregory Valiant, Paul Valiant, Konrad Karczewski, Siu On Chan, Kaitlin Samocha, Monkol Lek, Exome Aggregation Consortium, Shamil Sunyaev, Mark Daly, Daniel G MacArthur Microsoft Research, One Memorial Drive, Cambridge MA, USA Computer Science Department, Stanford University, Palo Alto CA, USA Computer Science Department, Brown University, Providence RI, USA Analytic and Translational...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015