cross validation error

No Unbiased Estimator of the Variance of K-Fold Cross-Validation

Journal: :Journal of Machine Learning Research 2003

Yoshua Bengio Yves Grandvalet

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare algorithm performances. In order to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the estimation of uncertainty around the K-fold cross-validation estimator. The main theorem shows that there exists no univ...

متن کامل

Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods

Journal: :Computational Statistics & Data Analysis 2010

Simone Borra Agostino Di Ciaccio

The estimators most widely used to evaluate the prediction error of a non-linear regression model are examined. An extensive simulation approach allowed the comparison of the performance of these estimators for different non-parametric methods, and with varying signal-to-noise ratio and sample size. Estimators based on resampling methods such as Leave-one-out, parametric and non-parametric Boot...

متن کامل

Shrinkage, cross validation calibration, and the construction of prognostic indices

2007

Werner Vach

The use of shrinkage methods for the construction of prognostic indices has been paid increasing attention in the literature on medical statistics in the last years. One approach for the construction of a shrinkage factor is cross validation calibration as suggested by van Houwelingen and le Cessie (1990). We investigate this approach in more detail. First we try to clarify, why shrinkage facto...

متن کامل

ارزیابی دقت روش‌های میان‌یابی در تخمین سطح ایستابی آب زیرزمینی (مطالعه موردی: آبخوان‌های فارسان ـ جونقان و سفید دشت)

ژورنال: علوم آب و خاک 2011

سید حسن طباطبائی, , محبوبه غزالی, ,

The accuracy and precision of the input data in decision making is important. Error originates from data collection, data entry, storage, retrieval and analysis of the data which consequently result in model error. One of the errors in spatial analysis is interpolation error. The main objective of this research was the suitability assessment of some interpolation methods for estimation of groun...

متن کامل

Error Rate Estimate for Cluster Data – Application to Automatic Spoken Language Identification

2003

J. H. Chauchat R. Rakotomalala F. Pellegrino

If the dataset available to machine learning results from cluster sampling, the usual cross-validation error rate estimate can lead to biased and misleading results. An adapted cross-validation is described for this case. Using a simulation, the sampling distribution of the generalization error rate estimate, under cluster or simple random sampling hypothesis, are compared to the true value. Th...

متن کامل

QSAR Studies on Andrographolide Derivatives as α-Glucosidase Inhibitors

2010

Jun Xu Sichao Huang Haibin Luo Guoji Li Jiaolin Bao Shaohui Cai Yuqiang Wang

Andrographolide derivatives were shown to inhibit alpha-glucosidase. To investigate the relationship between activities and structures of andrographolide derivatives, a training set was chosen from 25 andrographolide derivatives by the principal component analysis (PCA) method, and a quantitative structure-activity relationship (QSAR) was established by 2D and 3D QSAR methods. The cross-validat...

متن کامل

Choosing which Clothes to Wear Confidently: A Tool for Pattern Matching

2012

Nektarios Paisios Lakshminarayanan Subramanian Alexander Rubinsteyn

This work attempts to make a first step in computationally determining whether a pair of clothes, in this case of a tie and a shirt, can be worn together or not, based on the current social norms of color-matching. Our aim is to give visually impaired persons the ability, using snapshots taken by their mobile phones, to independently and confidently be able to choose from their wardrobe which s...

متن کامل

Predictive App roaches for Choosing Hyperparameters in Gaussian Processes

Journal: :Neural computation 1999

S. Sundararajan S. Sathiya Keerthi

Gaussian processes are powerful regression models specified by parameterized mean and covariance functions. Standard approaches to choose these parameters (known by the name hyperparameters) are maximum likelihood and maximum a posteriori. In this article, we propose and investigate predictive approaches based on Geisser's predictive sample reuse (PSR) methodology and the related Stone's cross-...

متن کامل

Using Cross‐validation to Evaluate Ceres‐maize Yield Simulations within a Decision Support System for Precision Agriculture

2007

K. R. Thorp W. D. Batchelor J. O. Paz A. L. Kaleita K. C. DeJonge

Crop growth models have recently been implemented to study precision agriculture questions within the framework of a decision support system (DSS) that automates simulations across management zones. Model calibration in each zone has occurred by automatically optimizing select model parameters to minimize error between measured and simulated yield over multiple growing seasons. However, to date...

متن کامل

Correcting for Optimistic Prediction in Small Data Sets

2014

Gordon C. S. Smith Shaun R. Seaman Angela M. Wood Patrick Royston Ian R. White

The C statistic is a commonly reported measure of screening test performance. Optimistic estimation of the C statistic is a frequent problem because of overfitting of statistical models in small data sets, and methods exist to correct for this issue. However, many studies do not use such methods, and those that do correct for optimism use diverse methods, some of which are known to be biased. W...

متن کامل