نتایج جستجو برای: imbalanced data

تعداد نتایج: 2412732  

2015
Zejin Ding ZEJIN DING YANQING ZHANG

In this dissertation, the problem of learning from highly imbalanced data is studied. Imbalance data learning is of great importance and challenge in many real applications. Dealing with a minority class normally needs new concepts, observations and solutions in order to fully understand the underlying complicated models. We try to systematically review and solve this special learning task in t...

2010
Ana M. Palacios Luciano Sánchez Inés Couso

There are real-world dataset where we can found classes with a very different percentage of patterns between them, that is to say we have classes represented by many examples (high percentage of patterns) and classes represented by few examples (low percentage of patterns). These kind of datasets receive the name of “imbalanced datasets”. In the field of classification problems the imbalanced d...

Journal: :International Journal of Bioinformatics Research and Applications 2020

Journal: :Neurocomputing 2014
Ming Gao Xia Hong Sheng Chen Christopher J. Harris Emad Khalaf

This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to...

2009
Viviane Palodeto Hernán Terenzi Jefferson Luiz Brum Marques

Protein secondary structure prediction (PSSP) is one of the main tasks in computational biology. During the last few decades, much effort has been made towards solving this problem, with various approaches, mainly artificial neural networks (ANN). Generally, in order to predict the protein secondary structure, the ANN training process is performed using CB513 data set. Like protein structures d...

2013
Seyda Ertekin

Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, VIRTUAL, that comb...

Journal: :Neurocomputing 2015
Jerzy Blaszczynski Jerzy Stefanowski

Various approaches to extend bagging ensembles for class imbalanced data are considered. First, we review known extensions and compare them in a comprehensive experimental study. The results show that integrating bagging with under-sampling is more powerful than over-sampling. They also allow to distinguish Roughly Balanced Bagging as the most accurate extension. Then, we point out that complex...

2017
Jerzy Blaszczynski Jerzy Stefanowski

Under-sampling extensions of bagging are currently the most accurate ensembles specialized for class imbalanced data. Nevertheless, since improvements of recognition of the minority class, in this type of ensembles, are usually associated with a decrease of recognition of majority classes, we introduce a new, two phase, ensemble called Actively Balanced Bagging. The proposal is to first learn a...

2013
P. Alagambigai K. Thangavel Ashok Kumar

The common challenge which is faced by much of the data clustering techniques is data complexity, which leads to many issues such as overlapping, lack of representative data and class imbalance. This may deteriorates the clustering process. The situation gets worse when the class imbalance is very high. To cluster such imbalanced data sets, better understandings of the dataset and efficient clu...

Journal: :Knowl.-Based Syst. 2014
Qingyao Wu Yunming Ye Haijun Zhang Michael K. Ng Shen-Shyang Ho

In this paper, we propose a new Random Forest (RF) based ensemble method, ForesTexter, to solve the imbalanced text categorization problems. RF has shown great success in many real-world applications. However, the problem of learning from text data with class imbalance is a relatively new challenge that needs to be addressed. A RF algorithm tends to use a simple random sampling of features in b...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید