overfitting

Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets

2004

John Loughrey Padraig Cunningham

In Wrapper based feature selection, the more states that are visited during the search phase of the algorithm the greater the likelihood of finding a feature subset that has a high internal accuracy while generalizing poorly. When this occurs, we say that the algorithm has overfitted to the training data. We outline a set of experiments to show this and we introduce a modified genetic algorithm...

متن کامل

A Short Introduction to Model Selection, Kolmogorov Complexity and Minimum Description Length (MDL)

Journal: :CoRR 2003

Volker Nannen

The concept of overfitting in model selection is explained and demonstrated. After providing some background information on information theory and Kolmogorov complexity, we provide a short explanation of Minimum Description Length and error minimization. We conclude with a discussion of the typical features of overfitting in model selection. 1 The paradox of overfitting Machine learning is the ...

متن کامل

Improving Neural Network Generalisation

1995

Paul L. Rosin Freddy Fierens

In this paper we study neural network overfitting on synthetically generated and real remote sensing data. The effect of overfitting is shown by: 1) visualising the shape of the decision boundaries in feature space during the learning process, and 2) by plotting the classification accuracy of independent test sets versus the number of training cycles. A solution to the overfitting problem is pr...

متن کامل

Online tools for demonstration of backtest overfitting

2015

David H. Bailey Jonathan M. Borwein Amir Salehipour Marcos López de Prado Qiji Zhu

In mathematical finance, backtest overfitting means the usage of historical market data (a backtest) to develop an investment strategy, where too many variations of the strategy are tried, relative to the amount of data available. Backtest overfitting is now thought to be a primary reason why quantitative investment models and strategies that look good on paper often disappoint in practice. In ...

متن کامل

Reducing Overfitting in Deep Networks by Decorrelating Representations

Journal: :CoRR 2015

Michael Cogswell Faruk Ahmed Ross B. Girshick C. Lawrence Zitnick Dhruv Batra

One major challenge in training Deep Neural Networks is preventing overfitting. Many techniques such as data augmentation and novel regularizers such as Dropout have been proposed to prevent overfitting without requiring a massive amount of training data. In this work, we propose a new regularizer called DeCov which leads to significantly reduced overfitting (as indicated by the difference betw...

متن کامل

What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models.

Journal: :Psychosomatic medicine 2004

Michael A Babyak

OBJECTIVE Statistical models, such as linear or logistic regression or survival analysis, are frequently used as a means to answer scientific questions in psychosomatic research. Many who use these techniques, however, apparently fail to appreciate fully the problem of overfitting, ie, capitalizing on the idiosyncrasies of the sample at hand. Overfitted models will fail to replicate in future s...

متن کامل

The Probability of Backtest Overfitting

2014

David H. Bailey Jonathan M. Borwein Marcos López de Prado Qiji Jim Zhu Matthew D. Foreman

Many investment firms and portfolio managers rely on backtests (i.e., simulations of performance based on historical market data) to select investment strategies and allocate capital. Standard statistical techniques designed to prevent regression overfitting, such as holdout, tend to be unreliable and inaccurate in the context of investment backtests. We propose a general framework to assess th...

متن کامل

On Overfitting Avoidance as Bias

1993

David H. Wolpert

In supervised learning it is commonly believed that penalizing complex functions helps one avoid "overfitting" functions to data, and therefore improves generalization. It is also commonly believed that cross-validation is an effective way to choose amongst algorithms for fitting functions to data. In a recent paper, Schaffer (1993) presents experimental evidence disputing these claims. The cur...

متن کامل

Complex Structure Leads to Overfitting: A Structure Regularization Decoding Method for Natural Language Processing

Journal: :CoRR 2017

Xu Sun Weiwei Sun Shuming Ma Xuancheng Ren Yi Zhang Wenjie Li Houfeng Wang

Recent systems on structured prediction focus on increasing the level of structural dependencies within the model. However, our study suggests that complex structures entail high overfitting risks. To control the structure-based overfitting, we propose to conduct structure regularization decoding (SR decoding). The decoding of the complex structure model is regularized by the additionally train...

متن کامل

An Improvement of AdaBoost to Avoid Overfitting

1998

Gunnar Rätsch Takashi Onoda Klaus-Robert Müller

Recent work has shown that combining multiple versions of weak classiiers such as decision trees or neural networks results in reduced test set error. To study this in greater detail, we analyze the asymptotic behavior of AdaBoost. The theoretical analysis establishes the relation between the distribution of margins of the training examples and the generated voting classiication rule. The paper...

متن کامل