Performance of PLS regression coefficients in selecting variables for each response of a multivariate PLS for omics-type data

نویسندگان

  • Giuseppe Palermo
  • Paolo Piraino
  • Hans-Dieter Zucht
چکیده

Multivariate partial least square (PLS) regression allows the modeling of complex biological events, by considering different factors at the same time. It is unaffected by data collinearity, representing a valuable method for modeling high-dimensional biological data (as derived from genomics, proteomics and peptidomics). In presence of multiple responses, it is of particular interest how to appropriately "dissect" the model, to reveal the importance of single attributes with regard to individual responses (for example, variable selection). In this paper, performances of multivariate PLS regression coefficients, in selecting relevant predictors for different responses in omics-type of data, were investigated by means of a receiver operating characteristic (ROC) analysis. For this purpose, simulated data, mimicking the covariance structures of microarray and liquid chromatography mass spectrometric data, were used to generate matrices of predictors and responses. The relevant predictors were set a priori. The influences of noise, the source of data with different covariance structure and the size of relevant predictors were investigated. Results demonstrate the applicability of PLS regression coefficients in selecting variables for each response of a multivariate PLS, in omics-type of data. Comparisons with other feature selection methods, such as variable importance in the projection scores, principal component regression, and least absolute shrinkage and selection operator regression were also provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Group and sparse group partial least square approaches applied in genomics context

MOTIVATION The association between two blocks of 'omics' data brings challenging issues in computational biology due to their size and complexity. Here, we focus on a class of multivariate statistical methods called partial least square (PLS). Sparse version of PLS (sPLS) operates integration of two datasets while simultaneously selecting the contributing variables. However, these methods do no...

متن کامل

ropls: PCA, PLS(-DA) and OPLS(-DA) for multivariate analysis and feature selection of omics data

4 Hands-on 3 4.1 Loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4.2 Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.3 Partial least-squares: PLS and PLS-DA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.4 Orthogonal partial least square...

متن کامل

Using basis expansions for estimating functional PLS regression. Applications with chemometric data

There are many chemometric applications, such as spectroscopy, where the objective is to explain a scalar response from a functional variable (the spectrum) whose observations are functions of wavelengths rather than vectors. In this paper, PLS regression is considered for estimating the linear model when the predictor is a functional random variable. Due to the infinite dimension of the space ...

متن کامل

Measuring Dose and Response with Multivariate Data

How to relate two blocks of variables? Partial least squares Analysis two-block pls u, v u'u = v'v = 1 Low-dimensional representation of the pattern of correlations/ covariances between two blocks of variables: A second dimension can be computed as directions orthogonal to the first ones, accounting for the second most correlation/covariance. A direction in each of the two data spaces, for whic...

متن کامل

Important Molecular Descriptors Selection Using Self Tuned Reweighted Sampling Method for Prediction of Antituberculosis Activity

In this paper, a new descriptor selection method for selecting an optimal combination of important descriptors of sulfonamide derivatives data, named self tuned reweighted sampling (STRS), is developed. descriptors are defined as the descriptors with large absolute coefficients in a multivariate linear regression model such as partial least squares(PLS). In this study , the absolute values of r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009