Functional classification of transcription factor binding sites: information content as a metric
نویسندگان
چکیده
The information content (relative entropy) of transcription factor binding sites (TFBS) is used to classify the transcription factors (TFs). The TF classes are clustered based on the TFBS clustering using information content. Any TF belonging to the TF class cluster has a chance of binding to any TFBS of the clustered group. Thus, out of the 41 TFBS (in humans), perhaps only 5 -10 TFs may be actually needed and in case of mouse instead of 13 TFs, we may have actually 5 or so TFs. The JASPAR database of TFBS are used in this study. The experimental data on TFs of specific gene expression from TRRD database is also coinciding with our computational results. This gives us a new way to look at the protein classificationnot based on their structure or function but by the nature of their TFBS.
منابع مشابه
A new biophysical metric for interrogating the information content in human genome sequence variation: Proof of concept.
The 21st century emergence of genomic medicine is shifting the paradigm in biomedical science from the population phenotype to the individual genotype. In characterizing the biology of disease and health disparities in population genetics, human populations are often defined by the most common alleles in the group. This definition poses difficulties when categorizing individuals in the populati...
متن کاملMapping of Transcription Factor Binding Region of Kappa Casein (CSN3) Gene in Iranian Bacterianus and Dromedaries Camels
κ-casein is a glycosilated protein in mammalian milk that plays an essential role in the milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. Transcriptional regulation, a first mechanism for controlling the development of organisms, is carried out by transcription facto...
متن کاملMapping of Transcription Factor Binding Region of Kappa Casein (CSN3) Gene in Iranian Bacterianus and Dromedaries Camels
κ-casein is a glycosilated protein in mammalian milk that plays an essential role in the milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. Transcriptional regulation, a first mechanism for controlling the development of organisms, is carried out by transcription facto...
متن کاملModeling Transcription Factor Binding Sites with Supervised Learning
We present a supervised learning approach to transcription factor binding site modeling for four distinct species. Using the consensus scoring method, we look at binding sites of unequal length and the alignment strategy associated with these binding sites. Pairwise scoring and information content were added to the consensus scoring to further increase accuracy of transcription factor binding s...
متن کاملWhy transcription factor binding sites are ten nucleotides long.
Gene expression is controlled primarily by transcription factors, whose DNA binding sites are typically 10 nt long. We develop a population-genetic model to understand how the length and information content of such binding sites evolve. Our analysis is based on an inherent trade-off between specificity, which is greater in long binding sites, and robustness to mutation, which is greater in shor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Integrative Bioinformatics
دوره 3 شماره
صفحات -
تاریخ انتشار 2006