imbalanced data sets

Variant of Data Particle Geometrical Divide for Imbalanced Data Sets Classification by the Example of Occupancy Detection

Journal: :Applied sciences 2021

The history of gravitational classification started in 1977. Over the years, approaches have reached many extensions, which were adapted into different problems. This article is next stage research concerning algorithms creating data particles by their geometrical divide. In previous analyses it was established that Geometrical Divide (GD) method outperforms algorithm based on classes a compoun...

متن کامل

School of IT Technical Report USING SIGNIFICANT, POSITIVELY ASSOCIATED AND RELATIVELY CLASS CORRELATED RULES FOR ASSOCIATIVE CLASSIFICATION OF IMBALANCED DATASETS

2007

FLORIAN VERHEIN SANJAY CHAWLA Florian Verhein Sanjay Chawla

The application of association rule mining to classification has led to a new family of classifiers which are often referred to as “Associative Classifiers (ACs)”. The advantage of ACs is that they are rule-based and thus lend themselves to an easier interpretation. Another advantage that ACs enjoy is that they are based on a global search criterion, unlike other rule-based classifiers – e.g. d...

متن کامل

Diversified Ensemble Classifiers for Highly Imbalanced Data Learning and their Application in Bioinformatics

2015

Zejin Ding ZEJIN DING YANQING ZHANG

In this dissertation, the problem of learning from highly imbalanced data is studied. Imbalance data learning is of great importance and challenge in many real applications. Dealing with a minority class normally needs new concepts, observations and solutions in order to fully understand the underlying complicated models. We try to systematically review and solve this special learning task in t...

متن کامل

Roughly Balanced Bagging for Imbalanced Data

Journal: :Statistical Analysis and Data Mining 2008

Shohei Hido Hisashi Kashima

Imbalanced class problems appear in many real applications of classification learning. We propose a novel sampling method to improve bagging for data sets with skewed class distributions. In our new sampling method “Roughly Balanced Bagging” (RB Bagging), the number of samples in the largest and smallest classes are different, but they are effectively balanced when averaged over all subsets, wh...

متن کامل

Machine Learning Methods for High-Dimensional Imbalanced Biomedical Data

2013

Tao Yang Yalin Wang Hasan Davulcu Pinghua Gong Rita Chattopadhyay Jiayu Zhou Sen Yang Shuo Xiang Qian Sun Zhi Nie Cheng Pan Rashmi Dubey

Learning from high dimensional biomedical data attracts lots of attention recently. High dimensional biomedical data often suffer from the curse of dimensionality and have imbalanced class distributions. Both of these features of biomedical data, high dimensionality and imbalanced class distributions, are challenging for traditional machine learning methods and may affect the model performance....

متن کامل

Learning SVM with weighted maximum margin criterion for classification of imbalanced data

Journal: :Mathematical and Computer Modelling 2011

Zhuangyuan Zhao Ping Zhong Yaohong Zhao

As a kernel-based method, whether the selected kernel matches the data determines the performance of support vector machine. Conventional support vector classifiers are not suitable to the imbalanced learning tasks since they tend to classify the instances to the majority class which is the less important class. In this paper, we propose a weighted maximum margin criterion to optimize the data-...

متن کامل

Fast Multilevel Support Vector Machines

Journal: :CoRR 2014

Talayeh Razzaghi Ilya Safro

Solving different types of optimization models (including parameters fitting) for support vector machines on largescale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that scales efficiently to very large data sets. Instead of solving the whole training set in one optimization process, the support vectors are obtained and gradually...

متن کامل

Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches

Journal: :Knowl.-Based Syst. 2013

Alberto Fernández Victoria López Mikel Galar María José del Jesús Francisco Herrera

0950-7051/$ see front matter 2013 Elsevier B.V. A http://dx.doi.org/10.1016/j.knosys.2013.01.018 ⇑ Corresponding author. Tel.: +34 953 213016; fax: E-mail addresses: [email protected] (A. ugr.es (V. López), [email protected] (M. Galar Jesus), [email protected] (F. Herrera). The imbalanced class problem is related to the real-world application of classification in engineering....

متن کامل

Absent data generating classifier for imbalanced class sizes

Journal: :Journal of Machine Learning Research 2015

Arash Pourhabib Bani K. Mallick Yu Ding

We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is e...

متن کامل

An Effective Method for Imbalanced Time Series Classification: Hybrid Sampling

2013

Guohua Liang

Most traditional supervised classification learning algorithms are ineffective for highly imbalanced time series classification, which has received considerably less attention than imbalanced data problems in data mining and machine learning research. Bagging is one of the most effective ensemble learning methods, yet it has drawbacks on highly imbalanced data. Sampling methods are considered t...

متن کامل