Machine Learning Ensembles: An Empirical Study and Novel Approach
نویسندگان
چکیده
Two learning ensemble methods, Bagging and Boosting, have been applied to decision trees to improve classification accuracy over that of a single decision tree learner. We introduce Bagging and propose a variant of it — Improved Bagging — which, in general, outperforms the original bagging algorithm. We experiment on 22 datasets from the UCI repository, with emphasis on the ensemble’s accuracy as the main performance measure. The variant of Bagging that we propose utilizes the “out of bag” samples in determining a decision tree’s voting power, whereas in the classical Bagging algorithm, all trees have the same voting power. Our proposed algorithm creates bootstrap samples from the training data according to a forced probability distribution that emulates random sampling with replacement. We thus achieve a uniformly distributed training set for each base learner, and also identically-sized “out of bag” verification sets for each learner. Our 10-fold cross-validation results and single runs on large data sets show our novel Bagging variant to improve classification accuracy relative to both the original Bagging ensemble and standard Boosting.
منابع مشابه
A committee machine approach for predicting permeability from well log data: a case study from a heterogeneous carbonate reservoir, Balal oil Field, Persian Gulf
Permeability prediction problem has been examined using several methods such as empirical formulas, regression analysis and intelligent systems especially neural networks and fuzzy logic. This study proposes an improved and novel model for predicting permeability from conventional well log data. The methodology is integration of empirical formulas, multiple regression and neuro-fuzzy in a commi...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملEnsembles of Learning Machines
Ensembles of learning machines constitute one of the main current directions in machine learning research, and have been applied to a wide range of real problems. Despite of the absence of an unified theory on ensembles, there are many theoretical reasons for combining multiple learners, and an empirical evidence of the effectiveness of this approach. In this paper we present a brief overview o...
متن کاملEmpirical analysis of support vector machine ensemble classifiers
Ensemble classification – combining the results of a set of base learners – has received much attention in the machine learning community and has demonstrated promising capabilities in improving classification accuracy. Compared with neural network or decision tree ensembles, there is no comprehensive empirical research in support vector machine (SVM) ensembles. To fill this void, this paper an...
متن کاملEnsembles of Kernel Predictors
This paper examines the problem of learning with a finite and possibly large set of p base kernels. It presents a theoretical and empirical analysis of an approach addressing this problem based on ensembles of kernel predictors. This includes novel theoretical guarantees based on the Rademacher complexity of the corresponding hypothesis sets, the introduction and analysis of a learning algorith...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000