Large-scale identification of sequence variants impacting human transcription factor occupancy in vivo
نویسندگان
چکیده
The function of human regulatory regions depends exquisitely on their local genomic environment and cellular context, complicating experimental analysis of the expanding pool of common diseaseand trait-associated variants that localize within regulatory DNA. We leverage allelically resolved genomic DNaseI footprinting data encompassing 166 individuals and 114 cell types to identify >60,000 common variants that directly impact transcription factor occupancy and regulatory DNA accessibility in vivo. The unprecedented scale of these data enable systematic analysis of the impact of sequence variation on transcription factor occupancy in vivo. We leverage this analysis to develop accurate models of variation affecting the recognition sites for diverse transcription factors, and apply these models to discriminate nearly 500,000 common regulatory variants likely to affect transcription factor occupancy across the human genome. The approach and results provide a novel foundation for analysis and interpretation of noncoding variation in complete human genomes, and for systems-level investigation of disease-associated variants.
منابع مشابه
Effects of sequence variation on differential allelic transcription factor occupancy and gene expression.
A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphobl...
متن کاملHigh Resolution Models of Transcription Factor-DNA Affinities Improve In Vitro and In Vivo Binding Predictions
Accurately modeling the DNA sequence preferences of transcription factors (TFs), and using these models to predict in vivo genomic binding sites for TFs, are key pieces in deciphering the regulatory code. These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices (PSSMs), which may match large ...
متن کاملp53 binds preferentially to genomic regions with high DNA-encoded nucleosome occupancy.
The human transcription factor TP53 is a pivotal roadblock against cancer. A key unresolved question is how the p53 protein selects its genomic binding sites in vivo out of a large pool of potential consensus sites. We hypothesized that chromatin may play a significant role in this site-selection process. To test this, we used a custom DNA microarray to measure p53 binding at approximately 2000...
متن کاملGenomic Promoter Analysis Predicts Functional Transcription Factor Binding
Background. The computational identification of functional transcription factor binding sites (TFBSs) remains a major challenge of computational biology. Results. We have analyzed the conserved promoter sequences for the complete set of human RefSeq genes using our conserved transcription factor binding site (CONFAC) software. CONFAC identified 16296 human-mouse ortholog gene pairs, and of thos...
متن کاملSequence-specific DNA binding by MYC/MAX to low-affinity non-E-box motifs
The MYC oncoprotein regulates transcription of a large fraction of the genome as an obligatory heterodimer with the transcription factor MAX. The MYC:MAX heterodimer and MAX:MAX homodimer (hereafter MYC/MAX) bind Enhancer box (E-box) DNA elements (CANNTG) and have the greatest affinity for the canonical MYC E-box (CME) CACGTG. However, MYC:MAX also recognizes E-box variants and was reported to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 47 شماره
صفحات -
تاریخ انتشار 2015