Network-based auto-probit modeling for protein function prediction.

نویسندگان

  • Xiaoyu Jiang
  • David Gold
  • Eric D Kolaczyk
چکیده

Predicting the functional roles of proteins based on various genome-wide data, such as protein-protein association networks, has become a canonical problem in computational biology. Approaching this task as a binary classification problem, we develop a network-based extension of the spatial auto-probit model. In particular, we develop a hierarchical Bayesian probit-based framework for modeling binary network-indexed processes, with a latent multivariate conditional autoregressive Gaussian process. The latter allows for the easy incorporation of protein-protein association network topologies-either binary or weighted-in modeling protein functional similarity. We use this framework to predict protein functions, for functions defined as terms in the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functionality. Furthermore, we show how a natural extension of this framework can be used to model and correct for the high percentage of false negative labels in training data derived from GO, a serious shortcoming endemic to biological databases of this type. Our method performance is evaluated and compared with standard algorithms on weighted yeast protein-protein association networks, extracted from a recently developed integrative database called Search Tool for the Retrieval of INteracting Genes/proteins (STRING). Results show that our basic method is competitive with these other methods, and that the extended method-incorporating the uncertainty in negative labels among the training data-can yield nontrivial improvements in predictive accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Methods for the Analysis of Network Data

14:00-14:45 " Network-based auto-probit modeling with application to protein function prediction " Eric D. Kolaczyk 09:35-10:10 " Quantifying and comparing complexity of cellular networks: structure beyond degree statistics " Alessia Annibale and Anthony Coolen 10:10-10:45 " Node and link roles in protein-protein interaction networks " " Using Distinct Aspects of Social Network Analysis to Impr...

متن کامل

Probit-Based Traffic Assignment: A Comparative Study between Link-Based Simulation Algorithm and Path-Based Assignment and Generalization to Random-Coefficient Approach

Probabilistic approach of traffic assignment has been primarily developed to provide a more realistic and flexible theoretical framework to represent traveler’s route choice behavior in a transportation network. The problem of path overlapping in network modelling has been one of the main issues to be tackled. Due to its flexible covariance structure, probit model can adequately address the pro...

متن کامل

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

A comparison of different network based modeling methods for prediction of the torque of a SI engine equipped with variable valve timing

Nowadays, due to increasing the complexity of IC engines, calibration task becomes more severe and the need to use surrogate models for investigating of the engine behavior arises. Accordingly, many black box modeling approaches have been used in this context among which network based models are of the most powerful approaches thanks to their flexible structures. In this paper four network base...

متن کامل

ANN Based Modeling for Prediction of Evaporation in Reservoirs (RESEARCH NOTE)

This paper is an attempt to assess the potential and usefulness of ANN based modeling for evaporation prediction from a reservoir, where in classical and empirical equations failed to predict the evaporation accurately. The meteorological data set of daily pan evaporation, temperature, solar radiation, relative humidity, wind speed is used in this study. The performance of feed forward back pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Biometrics

دوره 67 3  شماره 

صفحات  -

تاریخ انتشار 2011