imbalanced classes

Learning from imbalanced data sets with a Min-Max modular support vector machine

2011

Xiao-Lin WANG Yang YANG Hai ZHAO

Imbalanced data sets have significantly unequal distributions between classes. This between-class imbalance causes conventional classification methods to favor majority classes, resulting in very low or even no detection of minority classes. A Min-Max modular support vector machine (M-SVM) approaches this problem by decomposing the training input sets of the majority classes into subsets of sim...

متن کامل

Improving Rule-Based Classifiers Induced by MODLEM by Selective Pre-processing of Imbalanced Data

2007

Jerzy Stefanowski Szymon Wilk

In the paper we discuss inducing rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining classes (majority classes). To improve the ability of a classifier to recognize this class, we propose a new selective pre-processing approach that is applied to data before inducing a rule-based classifier. The approach combines se...

متن کامل

Machine Learning Methods for High-Dimensional Imbalanced Biomedical Data

2013

Tao Yang Yalin Wang Hasan Davulcu Pinghua Gong Rita Chattopadhyay Jiayu Zhou Sen Yang Shuo Xiang Qian Sun Zhi Nie Cheng Pan Rashmi Dubey

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance....

متن کامل

Impact of Local Data Characteristics on Learning Rules from Imbalanced Data

2015

Jerzy Stefanowski

In this paper we discus improving rule based classifiers learned from class imbalanced data. Standard learning methods often do not work properly with imbalanced data as they are biased to focus on the majority classes while " disregarding " examples from the minority class. The class imbalance affects various types of classifiers, including the rule-based ones. These difficulties include two g...

متن کامل

Class Imbalance and Active Learning

2011

Josh Attenberg Şeyda Ertekin

The rich history of predictive modeling has culminated in a diverse set of techniques capable of making accurate predictions on many real-world problems. Many of these techniques demand training, whereby a set of instances with ground-truth labels (values of a dependent variable) are observed by a model-building process that attempts to capture, at least in part, the relationship between the fe...

متن کامل

A Framework of Online Learning with Imbalanced Streaming Data

2017

Yan Yan Tianbao Yang Yi Yang Jianhui Chen

A challenge for mining large-scale streaming data overlooked by most existing studies on online learning is the skewdistribution of examples over different classes. Many previous works have considered cost-sensitive approaches in an online setting for streaming data, where fixed costs are assigned to different classes, or ad-hoc costs are adapted based on the distribution of data received so fa...

متن کامل

When Does Imbalanced Data Require more than Cost-Sensitive Learning?

2003

Dragos D. Margineantu

Most classification algorithms expect the frequency of examples form each class to be roughly the same. However, this is rarely the case for real-world data where very often the class probability distribution is nonuniform (or, imbalanced). For these applications, the main problem is usually the fact that the costs of misclassifying examples belonging to rare classes differ significantly from t...

متن کامل

Similar classes latent distribution modelling-based oversampling method for imbalanced image classification

Journal: :The Journal of Supercomputing 2023

Learning an unbiased classifier from imbalanced image datasets is challenging since the may be strongly biased toward majority class. To address this issue, some generative model-based oversampling methods have been proposed. However, most of these pay little attention to boundary samples, which contribute tiny learning classifier. In paper, we focus on samples and propose a similar classes lat...

متن کامل

Learning from Skewed Class Multi-relational Databases

Journal: :Fundam. Inform. 2008

Hongyu Guo Herna L. Viktor

Relational databases, with vast amounts of data–from financial transactions, marketing surveys, medical records, to health informatics observations– and complex schemas, are ubiquitous in our society. Multirelational classification algorithms have been proposed to learn from such relational repositories, where multiple interconnected tables (relations) are involved. These methods search for rel...

متن کامل

D-Confidence: An Active Learning Strategy which Efficiently Identifies Small Classes

2010

Nuno Escudeiro Alipio Jorge

In some classification tasks, such as those related to the automatic building and maintenance of text corpora, it is expensive to obtain labeled examples to train a classifier. In such circumstances it is common to have massive corpora where a few examples are labeled (typically a minority) while others are not. Semi-supervised learning techniques try to leverage the intrinsic information in un...

متن کامل