How well is enzyme function conserved as a function of pairwise sequence identity?

نویسندگان

  • Weidong Tian
  • Jeffrey Skolnick
چکیده

Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost's to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison, we calculate the function conservation of each enzyme family and then average the degree of enzyme function conservation across all enzyme families. Our analysis suggests that for functional transferability, 40% sequence identity can still be used as a confident threshold to transfer the first three digits of an EC number; however, to transfer all four digits of an EC number, above 60% sequence identity is needed to have at least 90% accuracy. Moreover, when PSI-BLAST is used, the magnitude of the E-value is found to be weakly correlated with the extent of enzyme function conservation in the third iteration of PSI-BLAST. As a result, functional annotation based on the E-values from PSI-BLAST should be used with caution. We also show that by employing an enzyme family-specific sequence identity threshold above which 100% functional conservation is required, functional inference of unknown sequences can be accurately accomplished. However, this comes at a cost: those true positive sequences below this threshold cannot be uniquely identified.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enzyme function less conserved than anticipated.

The level of sequence similarity that implies similarity in protein structure is well established. Recently, many groups proposed thresholds for similarity in sequence implying similarity in enzymatic function. All previous results suggest the strong conservation of enzymatic function above levels of 50% pairwise sequence identity. Here, I argue that all groups substantially overestimated the c...

متن کامل

Wavelets for Nonparametric Stochastic Regression with Pairwise Negative Quadrant Dependent Random Variables

We propose a wavelet based stochastic regression function estimator for the estimation of the regression function for a sequence of pairwise negative quadrant dependent random variables with a common one-dimensional probability density function. Some asymptotic properties of the proposed estimator are investigated. It is found that the estimators have similar properties to their counterparts st...

متن کامل

CLONING AND EXPRESSION OF LEISHMANOLYSIN GENE FROM LEISHMANIA MAJOR IN PRIMATE CELL LINES

Leishmanolysin is a worldwide disease that is caused by different species of the genus Leishmania. Leishmanolysin, One of the genes expressed by Leishmania, appears to be an ideal candidate for genetic vaccination. In this study, a full length sequence, which encodes Leishmanolysin functionally critical regions (amino acids 100-579), was cloned from a Leishmania strain endemic to Iran. Analysis...

متن کامل

Analyze and explain the function of education in The identity development Based identity status

The concept of identity with the question " Who am I " of The first stages of human life Been.The identity of each period been affected by the Terms and textural characteristics of societies and eras. Particularly Attention was paid to the concept of self in Most of the time.The authors attempt to investigate concept of identity in Space Education andto explain How function of educationAccordin...

متن کامل

Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.

Measuring in a quantitative, statistical sense the degree to which structural and functional information can be "transferred" between pairs of related protein sequences at various levels of similarity is an essential prerequisite for robust genome annotation. To this end, we performed pairwise sequence, structure and function comparisons on approximately 30,000 pairs of protein domains with kno...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of molecular biology

دوره 333 4  شماره 

صفحات  -

تاریخ انتشار 2003