imbalanced classes

Combine Vector Quantization and Support Vector Machine for Imbalanced Datasets

2006

Ting Yu John K. Debenham Tony Jan Simeon J. Simoff

In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. This paper rebalances skewed datasets by compressing the majority class. This approach combines Vector Quantization and Support Vector Machine and constructs a new approach, VQ-SVM, to rebalance datasets without significant information loss. Some issue...

متن کامل

Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm

2010

Piyasak Jeatrakul Kevin Kok Wai Wong Lance Chun Che Fung

In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic ...

متن کامل

A dynamic over-sampling procedure based on sensitivity for multi-class problems

Journal: :Pattern Recognition 2011

Francisco Fernández-Navarro César Hervás-Martínez Pedro Antonio Gutiérrez

Classification with imbalanced datasets supposes a new challenge for researches in the framework of machine learning. This problem appears when the number of patterns that represents one of the classes of the dataset (usually the concept of interest) is much lower than in the remaining classes. Thus, the learning model must be adapted to this situation, which is very common in real applications...

متن کامل

Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets

Journal: :Int. J. Approx. Reasoning 2009

Alberto Fernández María José del Jesús Francisco Herrera

In many real application areas, the data used are highly skewed and the number of instances for some classes are much higher than that of the other classes. Solving a classification task using such an imbalanced data-set is difficult due to the bias of the training towards the majority classes. The aim of this paper is to improve the performance of fuzzy rule based classification systems on imb...

متن کامل

Improved Fuzzy-Optimally Weighted Nearest Neighbor Strategy to Classify Imbalanced Data

2017

Harshita Patel Ghanshyam Singh Thakur

Learning from imbalanced data is one of the burning issues of the era. Traditional classification methods exhibit degradation in their performances while dealing with imbalanced data sets due to skewed distribution of data into classes. Among various suggested solutions, instance based weighted approaches secured the space in such cases. In this paper, we are proposing a new fuzzy weighted near...

متن کامل

A Review on Imbalanced Learning Methods

2015

Varsha S. Babar Roshani Ade T. E. Fawcett

Nowadays learning from imbalanced data sets are a relatively a very critical task for many data mining applications such as fraud detection, anomaly detection, medical diagnosis, information retrieval systems. The imbalanced learning problem is nothing but unequal distribution of data between the classes where one class contains more and more samples while another contains very little. Because ...

متن کامل

Text Sampling and Re-Sampling for Imbalanced Authorship Identification Cases

2006

Efstathios Stamatatos

Authorship identification can be seen as a single-label multi-class text categorization problem. Very often, there are extremely few training texts at least for some of the candidate authors. In this paper, we present methods to handle imbalanced multi-class textual datasets. The main idea is to segment the training texts into sub-samples according to the size of the class. Hence, minority clas...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Journal: Journal of Advances in Computer Research 2012

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

Handling Problems of Credit Data for Imbalanced Classes using SMOTEXGBoost

Journal: :Journal of physics 2021

Abstract Some researchers find data with imbalanced class conditions, where there are a number of minorities and majority. SMOTE is approach for an classes XGBoost one algorithm problems. This research uses or abbreviated as SMOTEXGBoost handling classes. The results showed almost the same accuracy value between at 99%. While AUC SMOTEXBoost has more stable than that equal to 99.89% training 98...

متن کامل

Empirical Similarity for Absent Data Generation in Imbalanced Classification

Journal: :CoRR 2015

Arash Pourhabib

When the training data in a two-class classification problem is overwhelmed by one class, most classification techniques fail to correctly identify the data points belonging to the underrepresented class. We propose Similarity-based Imbalanced Classification (SBIC) that learns patterns in the training data based on an empirical similarity function. To take the imbalanced structure of the traini...

متن کامل