Improving minority class prediction using cost-sensitive ensembles

نویسندگان

  • Bartosz Krawczyk
  • Michal Wozniak
  • Gerald Schaefer
چکیده

In this paper, we address the problem of dealing with unbalanced datasets in the context of classification, i.e. where some of the classes contain significantly more objects than the other(s). We show that we can this problem by choosing classifiers for a committee of multiple classifier systems. In particular, we propose to design such an ensemble on the basis of a cost of elementary classifiers, given by a cost matrix. To assure the diversity of the ensemble each of the base classifiers is trained on a random subspace. This allows to improve the recognition rate of the minority class, which is typically low when using canonical classifiers. We evaluated our proposed algorithm on a variety of benchmark datasets and show that it significantly outperforms the base cost-sensitive classifier and its boosted version. The results confirm that our approach is a useful tool for dealing with unbalanced datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Accuracy between Ensemble Methods: AdaBoost.NC vs. SMOTE.ENN

The imbalanced class distribution is one of the main issue in data mining. This problem exists in multi class imbalance, when samples containing in one class are greater or lower than that of other classes. Most existing imbalance learning techniques are only designed and tested for two-class scenarios. The new negative correlation learning (NCL) algorithm for classification ensembles, called A...

متن کامل

Making Accurate Credit Risk Predictions with Cost-Sensitive MLP Neural Networks

In practical applications to credit risk evaluation, most prediction models often make inaccurate decisions because of the lack of sufficient default data. The challenging issue of highly skewed class distribution between defaulter and nondefaulters is here faced by means of an algorithmic solution based on cost-sensitive learning. The present study is conducted on the popular Multilayer Percep...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

Using Random Forest to Learn Imbalanced Data

In this paper we propose two ways to deal with the imbalanced data classification problem using random forest. One is based on cost sensitive learning, and the other is based on a sampling technique. Performance metrics such as precision and recall, false positive rate and false negative rate, F-measure and weighted accuracy are computed. Both methods are shown to improve the prediction accurac...

متن کامل

Actively Balanced Bagging for Imbalanced Data

Under-sampling extensions of bagging are currently the most accurate ensembles specialized for class imbalanced data. Nevertheless, since improvements of recognition of the minority class, in this type of ensembles, are usually associated with a decrease of recognition of majority classes, we introduce a new, two phase, ensemble called Actively Balanced Bagging. The proposal is to first learn a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011